### Reading a CSV File

###

- Syntax:   pandas.read_csv("file_name.csv")
- If we are encountering useless rows, we'll use 'skiprows' arguement
  
  Syntax:   pandas.read_csv("file_name.csv", skiprows = n)  where n is the number of rows to skip 

- We can use header in place of skiprows. ex: pandas.read_csv("file_name.csv", header = n)

- In order to add column names or an extra row on top, we'll use header = None

  Syntax: pandas.read_csv("file_name.csv", header = None, names = ["column_1 name", "column_2 name"...] )

In [2]:
import pandas as pd

df = pd.read_csv("stock_data.csv")
df

Unnamed: 0,tickers,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,n.a.
2,MSFT,-1,85,64,bill gates
3,RIL,not available,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


###

- We'll use nrows arguements to read a specific number of rows excluding header

In [3]:
import pandas as pd

df = pd.read_csv("stock_data.csv", nrows=3)
df

Unnamed: 0,tickers,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,n.a.
2,MSFT,-1.0,85,64,bill gates


###

- In order to change the values to NAN (Not a Number), we'll use na_values attribute

Syntax:   pd.read_csv("File_name.csv", na_values = ["Values to be changed to NAN"])

- We use this attribute to clean the data

In [4]:
import pandas as pd

df = pd.read_csv("stock_data.csv", na_values=["not available", "n.a."])
df

Unnamed: 0,tickers,eps,revenue,price,people
0,GOOGL,27.82,87,845.0,larry page
1,WMT,4.61,484,65.0,
2,MSFT,-1.0,85,64.0,bill gates
3,RIL,,50,1023.0,mukesh ambani
4,TATA,5.6,-1,,ratan tata


###

- Now, we'll supply dictionary in place of list to change specific values to NaN

Syntax:  pd.read_csv("stock_data.csv", na_values = {
    
    'column name' : ["value1", "vlaue2"...]
    
})

- Whenever we'll find the specified values in the mentioned column, we'll reduce it by NaN


In [5]:
import pandas as pd

df = pd.read_csv("stock_data.csv", na_values = {
    
    'eps' : ["not available", "n.a."],
    'revenue' : ["not available", "n.a.", -1],
    'people' : ["not available", "n.a."],
    'price' : ["not available", "n.a."]
    
})
df

Unnamed: 0,tickers,eps,revenue,price,people
0,GOOGL,27.82,87.0,845.0,larry page
1,WMT,4.61,484.0,65.0,
2,MSFT,-1.0,85.0,64.0,bill gates
3,RIL,,50.0,1023.0,mukesh ambani
4,TATA,5.6,,,ratan tata


###

###

###

### Writing back in a CSV file

###

- df.to_csv is used to write contents in a newly made csv file
 
    Syntax: df.to_csv("Name_of_file.csv")

- index attribute is used to include or exclude an index

    Syntax: df.to_csv("Name_of_file.csv", index = False)

In [6]:
import pandas as pd

df.to_csv('new.csv', index = False)

###

- In order to get a specified number of columns in the new csv file, we'll use columns = ['column name 1', 'column name 2'...]

Syntax:  df.to_csv("Name_of_file.csv", columns = ["column name 1", "column name 2"...])

In [7]:
df.columns

Index(['tickers', 'eps', 'revenue', 'price', 'people'], dtype='object')

In [10]:
df.to_csv("new.csv", columns = ['tickers', 'eps'])

###

###

###

### READING AN EXCEL FILE

###

- Syntax:   pandas.read_csv("file_name.csv", "Sheet no.")

In [23]:
import pandas as pd

df = pd.read_excel("stock_data.xlsx", "Sheet1")
df

Unnamed: 0,tickers,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,n.a.
2,MSFT,-1,85,64,bill gates
3,RIL,not available,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


###

- In order to convert any value into another, we'll use converters
- converters are used as functions

Syntax:   

def function_name(cell):
    if cell == 'Arguement':
        return "returning_value"
    return cell

pd.read_excel("File_name_xlsx","SheetNO.", converters = {
    "Column_name" : "Function name"
})


In [25]:
import pandas as pd

def convert_people_cell(cell):
    if cell == "n.a.":
        return "Sam Walter"
    return cell

def convert_aps_cell(cell):
    if cell == "not available":
        return None
    return cell

df = pd.read_excel("stock_data.xlsx", "Sheet1", converters = {
    'people' : convert_people_cell,
    'eps' : convert_aps_cell
})

df

Unnamed: 0,tickers,eps,revenue,price,people
0,GOOGL,27.82,87,845,larry page
1,WMT,4.61,484,65,Sam Walter
2,MSFT,-1.0,85,64,bill gates
3,RIL,,50,1023,mukesh ambani
4,TATA,5.6,-1,n.a.,ratan tata


###

###

###

### WRITE TO AN EXCEL FILE

###

- We need to give the sheet_name attribute in order to make and use an xlsx file

Syntax:   df.to_xlsx("file_name.xlsx", sheet_name = "Sheet_name")

- We can use index = False if we don't want to write index

In [39]:
import pandas as pd

df.to_excel("new.xlsx", sheet_name = "Stocks")

###

- If we want to start from any other index than 0, we'll use startrow and startcol attribute

Syntax: df.to_excel("FIle_name.xlsx", sheet_name = "Name_of_sheet", startrow = "x_axis_index", startcol = "y_axis_index")

In [40]:
import pandas as pd

df.to_excel("new.xlsx", sheet_name = "Stocks", startrow = 1, startcol = 2)

###

- Writing 2 different dataframes in same excel file but in different sheets
- We'll use ExcelWriter class


    Syntax:  
    
    with pd.ExcelWriter("name_of_file.xlsx") as writer:
        dataframe1_name.to_excel(writer, sheet_name = "Sheet_name")
        dataframe2_name.to_excel(writer, sheet_name = "Sheet_name")

In [41]:
df_stocks = pd.DataFrame({
    
    'tickets' : ['GOOGLE', 'WMT', 'MSFT'],
    'Price' : [845, 65, 54],
    'pe' : [30.37, 14.26, 30.97],
    'eps' : [27.28, 4.61, 2.12]
    
})

df_weather = pd.DataFrame({
    
    'day' : ['1/1/2017', '1/2/2017', '1/3/2017',],
    'temperature' : [32, 35, 28],
    'event' : ['Rainy', 'Sunny', 'Snow']

})

In [42]:
with pd.ExcelWriter('stocks_weather.xlsx') as writer:
    df_stocks.to_excel(writer, sheet_name="stocks")
    df_weather.to_excel(writer, sheet_name="weather")

###


Other Properties for Reading/writing CSV/EXCEL file

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html
https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html