<a href="https://colab.research.google.com/github/KayKozaronek/03_Courses/blob/master/Reading_Writing_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Reading / Writing data

Page 125-144

Pandas supports the integration of many file formats and data sources out of the box (csv, excel, sql, json, parquet)

## Reading data in text format
* The function pd.read_csv() allows you to read a file and store it in a DataFrame
- With the default options, files must have a header and the seperator is a comma 
- The file could be both on a disk or on the network 

In [0]:
import pandas as pd

#pd.read_csv("./Sacramentorealestatetransactions.csv")
housing = pd.read_csv("http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv")

In [4]:
housing.head()

Unnamed: 0,street,city,zip,state,beds,baths,sq__ft,type,sale_date,price,latitude,longitude
0,3526 HIGH ST,SACRAMENTO,95838,CA,2,1,836,Residential,Wed May 21 00:00:00 EDT 2008,59222,38.631913,-121.434879
1,51 OMAHA CT,SACRAMENTO,95823,CA,3,1,1167,Residential,Wed May 21 00:00:00 EDT 2008,68212,38.478902,-121.431028
2,2796 BRANCH ST,SACRAMENTO,95815,CA,2,1,796,Residential,Wed May 21 00:00:00 EDT 2008,68880,38.618305,-121.443839
3,2805 JANETTE WAY,SACRAMENTO,95815,CA,2,1,852,Residential,Wed May 21 00:00:00 EDT 2008,69307,38.616835,-121.439146
4,6001 MCMAHON DR,SACRAMENTO,95824,CA,2,1,797,Residential,Wed May 21 00:00:00 EDT 2008,81900,38.51947,-121.435768


The `pd.read_table()`function allows you to set the separator using the `sep` argument

In [5]:
%%writefile input_data.txt
a|b|c|d|message
1|2|3|4|hello
5|6|7|8|world
9|10|11|12|foo

Writing input_data.txt


In [6]:
pd.read_csv("input_data.txt", sep="|")

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


The header parameter allows you to set whether or not a header exists

In [8]:
pd.read_csv("input_data.txt", sep="|", header =None)

Unnamed: 0,0,1,2,3,4
0,a,b,c,d,message
1,1,2,3,4,hello
2,5,6,7,8,world
3,9,10,11,12,foo


The na_values parameter specifies the null values 

In [9]:
%%writefile input_data.txt
a|b|c|d|message
1|2|3|NA|hello
5|6|7|8|world
9|NA|11|12|foo

Overwriting input_data.txt


In [10]:
pd.read_table("input_data.txt", sep="|", na_values=["NA"])

Unnamed: 0,a,b,c,d,message
0,1,2.0,3,,hello
1,5,6.0,7,8.0,world
2,9,,11,12.0,foo


The `pd.read_fwf()`function allows you to read a file when the columns have fixed positions

In [16]:
%%writefile input_data.txt
a b   c   d   message
1 2   223 NA  hello
5 6   7   8   world
9 10  11  12  foo

Overwriting input_data.txt


In [17]:
pd.read_fwf("input_data.txt")

Unnamed: 0,a,b,c,d,message
0,1,2,223,,hello
1,5,6,7,8.0,world
2,9,10,11,12.0,foo


The converters parameter allows you to set conversion functions in the columns of the DataFrame 

In [22]:
%%writefile input_data.txt
col1|col2|col3
one|1.232,12|a
two|2.000,32|b

Overwriting input_data.txt


In [23]:
pd.read_csv("input_data.txt", sep="|",
            converters={"col2":lambda value: float(value.replace(".","").replace(",","."))})

Unnamed: 0,col1,col2,col3
0,one,1232.12,a
1,two,2000.32,b


## Reading data from Excel 
- Pandas also allows you to read an Excel format file 
- If we want to read several sheets of the same Excel file, it is convenient to first load the file into memeory with the `pd.ExcelFile()` method

In [0]:
# df = pd.read_excel("Example_File.xlsx")
# df = pd.read_excel("Example_File.xlsx", "Example_Sheet")

# xlsx = pd.ExcelFile("Example_file.xlsx")
# df = pd.read_excel(xlsx, "Example_Sheet")

## Reading data from a JSON file

- Using the `pd.read_json()`function, pandas will read data in JSON format and load it into a DataFrame 

In [25]:
%%writefile input_data.json
[ {"a": 1, "b":2, "c":3},
  {"a": 4, "b":5, "c":6},
  {"a": 7, "b":8, "c":9}]

Writing input_data.json


In [26]:
pd.read_json("input_data.json")

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


An alternative is to use the json library to read the file 

In [29]:
import json 

with open("input_data.json") as json_data:
  result = json.load(json_data)

pd.DataFrame(result[:], columns =["a","b","c"])

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6
2,7,8,9


#Reading data from a Web Service
- To read the data of a web service we could use the request library

In [31]:
import requests

url = "https://api.github.com/repos/pandas-dev/pandas/issues"
resp = requests.get(url)

if resp.ok:
  data = resp.json()
  dataframe = pd.DataFrame(data, columns=["number", "title","labels","state"])

dataframe.head()

Unnamed: 0,number,title,labels,state
0,33750,CLN: Remove is_null_period,[],open
1,33749,BUG: Fix mixed datetime dtype inference,"[{'id': 76865106, 'node_id': 'MDU6TGFiZWw3Njg2...",open
2,33748,BUG: to_hdf and HDFStore raise KeyError for Da...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
3,33746,QST: Does ExcelWriter accept a file object? Wh...,"[{'id': 1954720290, 'node_id': 'MDU6TGFiZWwxOT...",open
4,33745,BUG: support skew function for custom BaseInde...,"[{'id': 1045950827, 'node_id': 'MDU6TGFiZWwxMD...",open


## Reading data from HTML
- Pandas allows to read a file with HTML format through the `read_html()` function
- This function returns a list of dataframes (there may be several tables on the website

In [33]:
dataframes = pd.read_html("https://fdic.gov/bank/individual/failed/banklist.html")
dataframes[0].head()

Unnamed: 0,Bank Name,City,ST,CERT,Acquiring Institution,Closing Date
0,The First State Bank,Barboursville,WV,14361,"MVB Bank, Inc.","April 3, 2020"
1,Ericson State Bank,Ericson,NE,18265,Farmers and Merchants Bank,"February 14, 2020"
2,City National Bank of New Jersey,Newark,NJ,21111,Industrial Bank,"November 1, 2019"
3,Resolute Bank,Maumee,OH,58317,Buckeye State Bank,"October 25, 2019"
4,Louisa Community Bank,Louisa,KY,58112,Kentucky Farmers Bank Corporation,"October 25, 2019"


## Data Writing
- Once we have a DataFrame in memory, we could write it to disk with one of the following functions:
  - `dataframe.to_csv("file.csv")`
  - `dataframe.to_excel("file.xlsx")`
  - `dataframe.to_json("file.json")`

In [0]:
xlsx = pd.ExcelWriter("file.xlsx")
dataframe.to_excel(xlsx)
xlsx.save()

## Reading data from a database
The sqlalchemy package allows you to connect to a database and load DataFrames from tables or queries

In [0]:
from sqlalchemy import create_engine 
# engine = create_engine("sqlite:///:memory:")
# pd.read_sql("SELECT * FROM tabla;", engine)
# pd_read_sql_table("table", engine)

# Exercise 23
- Load the information from the following url into a Dataframe called "df1"
https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv


- Load the information from the following url into a Dataframe called "df2"
https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user
  - The column 'user_id' must be the index of the DataFrame

In [76]:
df1 = pd.read_table("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv",
                    )

df1.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


In [73]:
df2 = pd.read_table("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/u.user", 
              sep ="|", 
              #header=None, 
              index_col= "user_id" 
              )

df2.head()

Unnamed: 0_level_0,age,gender,occupation,zip_code
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,24,M,technician,85711
2,53,F,other,94043
3,23,M,writer,32067
4,24,M,technician,43537
5,33,F,other,15213


# Exercise 24
- Write the dataframes from the previous exercise in an excel called "Data.xlsx"
  - df1' save it on a sheet called 'chipotle' (without the index)
  - 'df2' on another sheet called 'user'
- Recover in a different DataFrame the information from the 'user' sheet of the excel file "Data.xlsx".

# Exercise 25
- Read the data from the following web service in a DataFrame 
https://sedeaplicaciones.minetur.gob.es/ServiciosRESTCarburantes/PreciosCarburantes/EstacionesTerrestres/

