## Import pandas

In [1]:
import pandas as pd

# 1. Read Data

###### Read csv data
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record https://en.wikipedia.org/wiki/Comma-separated_values. In pandas we use the pandas.read_csv() method to read csv data. The read_csv() method accepts important arguments such as filepath_or_buffer which specifies the file path. sep indicates the delimiter to use, engine determines which engine to use between C which is faster but less features or Python which is slower but feature-complete.
usecols defines the columns to be fetched, nrows which specifies the number of rows to read, chunksize limits the amount or records to fetch at a time and many other arguments.


In [2]:
titanic_df=pd.read_csv('titanic.csv')
titanic_df.head()

Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


###### Read Excel data

In [3]:
titanic_excel_df=pd.read_excel('titanic.xlsx','Sheet1')
titanic_excel_df

Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.2500
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.9250
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1000
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.0500
...,...,...,...,...,...,...,...,...
882,0,2,Rev. Juozas Montvila,male,27.0,0,0,13.0000
883,1,1,Miss. Margaret Edith Graham,female,19.0,0,0,30.0000
884,0,3,Miss. Catherine Helen Johnston,female,7.0,1,2,23.4500
885,1,1,Mr. Karl Howell Behr,male,26.0,0,0,30.0000


###### Read html file
The HyperText Markup Language, or HTML is the standard markup language for documents designed to be displayed in a web browser. https://en.wikipedia.org/wiki/HTML. Pandas has pandas.read_html() that extracts data from html files.

In [4]:
gdp_df=pd.read_html('https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)#Table')
gdp_df[2].head()

Unnamed: 0_level_0,Country/Territory,Region,IMF[1],IMF[1],United Nations[12],United Nations[12],World Bank[13][14],World Bank[13][14]
Unnamed: 0_level_1,Country/Territory,Region,Estimate,Year,Estimate,Year,Estimate,Year
0,United States,Americas,22675271.0,2021,21433226,2019,20936600.0,2020
1,China,Asia,16642318.0,[n 2]2021,14342933,[n 3]2019,14722731.0,2020
2,Japan,Asia,5378136.0,2021,5082465,2019,4975415.0,2020
3,Germany,Europe,4319286.0,2021,3861123,2019,3806060.0,2020
4,United Kingdom,Europe,3124650.0,2021,2826441,2019,2707744.0,2020


###### Read Json Data

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write https://www.json.org/json-en.html. We can read json data to pandas dataframe using the read_json() method. pd.read_json() method converts a json data to a pandas dataframe. It accepts important arguments such as; path_or_buf which is the location of the data source file. chunksize which is useful when fetching large data. encoding decodes the data to a readable formart. 
orient defines the expected json formart and can take any of index, split, records, columns or values. Check the pandas documentation for more details https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.read_json.html 


In [5]:
# json_df=pd.read_json("https://raw.githubusercontent.com/BindiChen/machine-learning/master/data-analysis/027-pandas-convert-json/data/simple.json")
json_exchange_rate=pd.read_json("https://api.exchangerate-api.com/v4/latest/USD")
json_exchange_rate=json_exchange_rate[['provider','base','date','time_last_updated','rates']].head()
json_exchange_rate

Unnamed: 0,provider,base,date,time_last_updated,rates
AED,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,3.67
AFN,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,96.34
ALL,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,107.32
AMD,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,487.66
ANG,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,1.79


###### Parquet
Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language https://parquet.apache.org/. The pandas.read_parquet() method reads the parquet data file to pandas data frame. It provides a partitioned binary columnar serialization for data frames. 

First we will create a parquet file and demonstrated how to read it in pandas

In [6]:
# Uncomment this command to install pyarrow
# !pip install pyarrow

In [7]:
titanic_df.to_parquet('parquet_sample_data.parquet', engine='pyarrow')

In [8]:
# Read parquet file
parquet_df=pd.read_parquet('parquet_sample_data.parquet', engine='pyarrow')
parquet_df.head()

Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


###### Read Pickle file
Pickle is a file formart that serializes Python object to byte stream. In Machine Learning we use pickle files to share models and deploy them in production. Let's first create a pickle file and demonstrate how to read it in pandas.

In [9]:
json_exchange_rate.to_pickle('pickle_sample_data.pkl')
pickle_df=pd.read_pickle('pickle_sample_data.pkl')
pickle_df.head()

Unnamed: 0,provider,base,date,time_last_updated,rates
AED,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,3.67
AFN,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,96.34
ALL,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,107.32
AMD,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,487.66
ANG,https://www.exchangerate-api.com,USD,2021-12-02,1638403201,1.79


###### Read SQL Data
SQL is a popular language for working with and manipulating data in the databases. In Data Science and Analytics understanding SQL is a key skills to efficiently and commfortably work with data. Modern databases such as Oracle, MSSQL, GBQ and Amazon Redshift have advanced to the level beyond being a data storage container to providing capabilities such as writing advanced in-database Machine Learning models through SQL. Pandas enables us to read data from the database through SQL. To read data through SQL in pandas we first need to install the respective python-database driver.

In [10]:
# Let's first create an sqlite database and save titanic dataset to it
from sqlalchemy import create_engine
engine = create_engine('sqlite:///:memory:')
titanic_df.to_sql('titanic_sqlite_data', engine, chunksize=100)

In [11]:
titanic_sqlite_df=pd.read_sql_table('titanic_sqlite_data', engine)
titanic_sqlite_df.head()

Unnamed: 0,index,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


# 2. Save Data

###### Save data to csv

In [12]:
json_exchange_rate.to_csv("Exchange_Rate.csv")

###### Save data to excel

In [13]:
json_exchange_rate.to_excel("Exchange_Rate.xlsx")

###### Save data to html

In [14]:
titanic_df.head().to_html()

'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Survived</th>\n      <th>Pclass</th>\n      <th>Name</th>\n      <th>Sex</th>\n      <th>Age</th>\n      <th>Siblings/Spouses Aboard</th>\n      <th>Parents/Children Aboard</th>\n      <th>Fare</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>0</td>\n      <td>3</td>\n      <td>Mr. Owen Harris Braund</td>\n      <td>male</td>\n      <td>22.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>7.2500</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>1</td>\n      <td>1</td>\n      <td>Mrs. John Bradley (Florence Briggs Thayer) Cumings</td>\n      <td>female</td>\n      <td>38.0</td>\n      <td>1</td>\n      <td>0</td>\n      <td>71.2833</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>1</td>\n      <td>3</td>\n      <td>Miss. Laina Heikkinen</td>\n      <td>female</td>\n      <td>26.0</td>\n      <td>0</td>\n      <td>0</td>\n      <td

###### Save data to json

In [15]:
titanic_df.head().to_json(orient='columns')

'{"Survived":{"0":0,"1":1,"2":1,"3":1,"4":0},"Pclass":{"0":3,"1":1,"2":3,"3":1,"4":3},"Name":{"0":"Mr. Owen Harris Braund","1":"Mrs. John Bradley (Florence Briggs Thayer) Cumings","2":"Miss. Laina Heikkinen","3":"Mrs. Jacques Heath (Lily May Peel) Futrelle","4":"Mr. William Henry Allen"},"Sex":{"0":"male","1":"female","2":"female","3":"female","4":"male"},"Age":{"0":22.0,"1":38.0,"2":26.0,"3":35.0,"4":35.0},"Siblings\\/Spouses Aboard":{"0":1,"1":1,"2":0,"3":1,"4":0},"Parents\\/Children Aboard":{"0":0,"1":0,"2":0,"3":0,"4":0},"Fare":{"0":7.25,"1":71.2833,"2":7.925,"3":53.1,"4":8.05}}'

###### Save data to Parquet

In [16]:
titanic_df.to_parquet('parquet_titanic_data.parquet', engine='pyarrow')

###### Save data to pickle

In [17]:
titanic_df.to_pickle('pickle_titanic_data.pkl')

###### Save data to SQLite

In [18]:
json_exchange_rate.to_sql('exchange_rate_sqlite_data', engine, chunksize=100)