# SESSION 5

1. Dataset, Series & Dataframe
2. Creating a DataFrame
3. Loading files into a DataFrame<br>
	3.1 Reading in DataFrame<br>
	3.2 DataFrame head() Function<br>
	3.3 DataFrame tail() Function<br>
	3.4 Information About the DataSet<br>
4. Read CSV in Pandas<br>
    4.1 Read Sample CSV:<br>
    4.2 To check maximum number of rows<br>
    4.3 Read a CSV file and print the first few rows using the head() method:<br>
    4.4 Read a CSV file and print the Last few rows using the tail() method:<br>
    4.5 Pandas describe() method on CSV Files<br>
    4.6 Writing a Pandas DataFrame to CSV file<br>
5. Read Excel in Pandas<br>
    5.1 Install xlrd<br>
    5.2 Load multiple sheets<br>
    5.3 Display List of Columns Headers of the Excel Sheet<br>
    5.4 Pandas read_excel() usecols example<br>

## 1. Dataset, Series & Dataframe

- Dataset : a general term that refers to any collection of data. In pandas, it usually refers to a collection of data that is stored in memory and can be accessed and manipulated using the pandas library.
- Series : a one-dimensional array-like data structure, similar to a column in a spreadsheet.
- DataFrame : a two-dimensional table-like data structure consisting of rows and columns, similar to a spreadsheet.

<img align = 'middle' src = 'Images/Intro-pandas-dataframe.png' width = 700>

<hr>

## 2. Creating a DataFrame

**We can create a DataFrame using :**

- Lists
- Dictionaries
- NumPy Arrays
- Series

In [52]:
# Create an empty DataFrame

# importing the pandas library 
 
import pandas as pd  
df = pd.DataFrame()  
print (df)  

Empty DataFrame
Columns: []
Index: []


**Creating a DataFrame using a List :**

In [53]:
# importing the pandas library 
import pandas as pd 
 
# Create a list of strings  
x = ['Python', 'Pandas']  
  
# Calling Data Frame constructor & passing it the created list
df = pd.DataFrame(x)  
print(df)  

        0
0  Python
1  Pandas


![Untitled%20Diagram%20%2826%29.jpg](attachment:Untitled%20Diagram%20%2826%29.jpg)

**Creating a DataFrame using a Dictionary of lists :**

In [54]:
# importing the pandas library  
import pandas as pd  

# Create a Dictionary of lists 
info = {'ID' :[101, 102, 103],'  Department' :['B.Sc','B.Tech','M.Tech']} 

# Calling Data Frame constructor & passing it the created dictionary
df = pd.DataFrame(info)  
print (df)  

    ID   Department
0  101         B.Sc
1  102       B.Tech
2  103       M.Tech


**Creating a DataFrame using a Dictionary of Series :**

In [55]:
# importing the pandas library  
import pandas as pd  

# Create a Dictionary of Series
# The number of indexes should be equal to the number of elements in the Series
info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),  
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}  

# Calling Data Frame constructor & passing it the created dictionary
d1 = pd.DataFrame(info)  
print (d1)  

   one  two
a  1.0    1
b  2.0    2
c  3.0    3
d  4.0    4
e  5.0    5
f  6.0    6
g  NaN    7
h  NaN    8


**Column Selection on a DataFrame :**

In [56]:
# We can select any column from the DataFrame, Here is the code that demonstrates how to select a column from the DataFrame.

# importing the pandas library  
import pandas as pd  

# Create a Dictionary of Series
info = {'one' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f']),  
   'two' : pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])}  

# Calling Data Frame constructor & passing it the created dictionary
d1 = pd.DataFrame(info) 

# Select a column from the created DataFrame 
print (d1 ['one']) 

a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
f    6.0
g    NaN
h    NaN
Name: one, dtype: float64


**Adding columns to a DataFrame :**

In [57]:
# We can also add a new column to an existing DataFrame. The below code demonstrates how to add a new column to an existing DataFrame

# importing the pandas library  
import pandas as pd  
  
# Create a Dictionary of Series
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),  
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}  

# Calling Data Frame constructor & passing it the created dictionary
df = pd.DataFrame(info)  
  
# Adding a new column to an existing DataFrame object   
print ("Add a new column by passing a Series")  
df['three'] = pd.Series([20, 40, 60], index=['a', 'b', 'c'])   
print (df)  

Add a new column by passing a Series
   one  two  three
a  1.0    1   20.0
b  2.0    2   40.0
c  3.0    3   60.0
d  4.0    4    NaN
e  5.0    5    NaN
f  NaN    6    NaN


**Deleting columns from a DataFrame :**

In [58]:
# Column Deletion:
# We can also delete any column from an existing DataFrame. This code helps to demonstrate how we can delete a column from an existing DataFrame

# importing the pandas library  
import pandas as pd  
  
info = {'one' : pd.Series([1, 2], index= ['a', 'b']),   
   'two' : pd.Series([1, 2, 3], index=['a', 'b', 'c'])}  
     
df = pd.DataFrame(info)  
print ("The DataFrame:")  
print (df)  
  
# Deleting a column using del function  
print ("Delete the first column:")  
del df['one']  
print (df)  

# Deleting a column using pop function  
print ("Delete the other column:")  
df.pop('two')  
print (df)  

The DataFrame:
   one  two
a  1.0    1
b  2.0    2
c  NaN    3
Delete the first column:
   two
a    1
b    2
c    3
Delete the other column:
Empty DataFrame
Columns: []
Index: [a, b, c]


**Performing operations on rows of a DataFrame :**
- We can easily select, add, or delete any row from a DataFrame. First of all, we will understand the row selection. Let's see how we can select a row using different ways that are as follows :

**1) Selecting a row by Indexing :**

- We can select any row by passing the row index to a loc function.

In [59]:
# importing the pandas library  
import pandas as pd  
  
info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),   
   'two' : pd.Series([6, 7, 8, 9, 10, 11], index=['a', 'b', 'c', 'd', 'e', 'f'])}  
  
df = pd.DataFrame(info)  
print(df)

# Passing a row index to the loc function 
print (df.loc['b'])  

   one  two
a  1.0    6
b  2.0    7
c  3.0    8
d  4.0    9
e  5.0   10
f  NaN   11
one    2.0
two    7.0
Name: b, dtype: float64


**2) Selecting a row by Slicing :**

In [60]:
# Slicing is another method to select multiple rows using ':' operator.

# importing the pandas library  
import pandas as pd  

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),   
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])} 
 
df = pd.DataFrame(info) 

# Slicing the DataFrame (similar to slicing a list)
print (df[2:5])  

   one  two
c  3.0    3
d  4.0    4
e  5.0    5


**3) Adding new rows to a DataFrame :**
- We can easily add new rows to a DataFrame using the ```append()``` function. It adds new rows at the end of the DataFrame.
- The append method of a DataFrame in pandas is used to append rows of another DataFrame to the end of the original DataFrame.

In [61]:
import pandas as pd

info = {'one' : pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),   
   'two' : pd.Series([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])} 
 
df = pd.DataFrame(info) 

# create a new row
new_row = { 'one': 6, 'two' : 7}

# append the new row to the DataFrame
df = df.append(new_row, ignore_index=True)

# print the updated DataFrame
print(df)

   one  two
0  1.0    1
1  2.0    2
2  3.0    3
3  4.0    4
4  5.0    5
5  NaN    6
6  6.0    7


In [62]:
import pandas as pd

# Create the original DataFrame
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Create the second DataFrame to append
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Append df2 to df1
df3 = df1.append(df2)

print(df3)

   A  B
0  1  3
1  2  4
0  5  7
1  6  8


**Note : The frame.append method will be removed from pandas in it's future version. So use pandas.concat instead.**

- The ```concat()``` function in pandas is used to concatenate two or more DataFrames into a single DataFrame.

In [63]:
import pandas as pd

# Create the first DataFrame
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

# Create the second DataFrame
df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]})

# Concatenate the two DataFrames along the rows
result = pd.concat([df1, df2], axis=0)

print(result)

     A    B    C    D
0  1.0  3.0  NaN  NaN
1  2.0  4.0  NaN  NaN
0  NaN  NaN  5.0  7.0
1  NaN  NaN  6.0  8.0


**4) Locating Rows of a DataFrame :**
- We can use the loc attribute to return one or more specified row(s) of a DataFrame

In [64]:
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

# load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)

print(df.loc[2]) # elements in the row at index 2 will be returned


   calories  duration
0       420        50
1       380        40
2       390        45
calories    390
duration     45
Name: 2, dtype: int64


In [65]:
# Accessing elements of multiple rows :

# Use a list of indexes:
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

# Load data into the DataFrame object:
df = pd.DataFrame(data)

# Accessing elements of rows 0 & 1 :
print(df.loc[[0, 1]])


   calories  duration
0       420        50
1       380        40


**Named Indexes : With the ```index``` argument, you can name your own indexes**

In [66]:
# Add a list of names to give each row a name :

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df) 

      calories  duration
day1       420        50
day2       380        40
day3       390        45


<hr>

## 3. Operations in DataFrame

#### 3.1 Reading in DataFrame

**3.2 DataFrame ```head()``` Function**

- The head() returns the first n rows for the object based on position.
- If your object has the right type of data in it, it is useful for quick testing.
- This method is used for returning top n (by default value 5) rows of a data frame or series.

In [2]:
import pandas as pd

# head() Function in DataFrame

info1 = pd.DataFrame({'language':['C#', 'C++', 'Python', 'Java','PHP']})  
info1.head()  
info1.head(3)

Unnamed: 0,language
0,C#
1,C++
2,Python


**3.3 DataFrame ```tail()``` Function**

- The ```tail()``` method returns the headers and a specified number of rows, starting from the bottom.

In [4]:
import pandas as pd    

# making data frame   
data = pd.read_csv("Images\largedata.csv")    
 
# calling tail() method    
# storing in new variable   
data_bottom  = data.tail(2)    
# display   
data_bottom 

Unnamed: 0,Series_reference,Period,Data_value,Suppressed,STATUS,UNITS,Magnitude,Subject,Group,Series_title_1,Series_title_2,Series_title_3,Series_title_4,Series_title_5
6193,BDCQ.SF8RSCA,2022.06,579.955,,R,Dollars,6,Business Data Collection - BDC,Industry by financial variable (NZSIOC Level 1),Operating profit,"Arts, Recreation and Other Services",Current,Unadjusted,
6194,BDCQ.SF8RSCA,2022.09,609.161,,F,Dollars,6,Business Data Collection - BDC,Industry by financial variable (NZSIOC Level 1),Operating profit,"Arts, Recreation and Other Services",Current,Unadjusted,


**3.4 Information About the DataSet**
-  DataFrames object has a method called ```info()```, that gives more information about the data set.

In [6]:
import pandas as pd

data = pd.read_csv("Images\largedata.csv")    
data.head()

print(data.info()) 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6195 entries, 0 to 6194
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Series_reference  6195 non-null   object 
 1   Period            6195 non-null   float64
 2   Data_value        5720 non-null   float64
 3   Suppressed        15 non-null     object 
 4   STATUS            6195 non-null   object 
 5   UNITS             6195 non-null   object 
 6   Magnitude         6195 non-null   int64  
 7   Subject           6195 non-null   object 
 8   Group             6195 non-null   object 
 9   Series_title_1    6195 non-null   object 
 10  Series_title_2    6195 non-null   object 
 11  Series_title_3    6195 non-null   object 
 12  Series_title_4    6195 non-null   object 
 13  Series_title_5    0 non-null      float64
dtypes: float64(3), int64(1), object(10)
memory usage: 677.7+ KB
None


<hr>

## 3.5 Loading files into a DataFrame
- If your data sets are stored in a file, Pandas can load them into a DataFrame.

In [73]:
# Load a comma separated file (CSV file) into a DataFrame :

import pandas as pd

df = pd.read_csv('Images/data.csv')

print(df) 

   Student ID First Name Last Name                  Email  GPA Enrollment Date
0           1       John       Doe     john.doe@email.com  3.7      2020-09-01
1           2       Jane     Smith   jane.smith@email.com  3.9      2020-09-01
2           3        Bob   Johnson  bob.johnson@email.com  3.2      2020-09-01
3           4      Sarah       Lee    sarah.lee@email.com  3.5      2020-09-01
4           5      David      Wong   david.wong@email.com  3.8      2020-09-01


<br>
<img src="images/Image-csv.png" alt="Drawing" style="width: 600px;" align="center"/>
<br>
<br>

<hr>

## 4. Read CSV in Pandas

- A simple way to store big data sets is to use CSV files (comma separated files).

- CSV files contains plain text and is a well known format that can be read by everyone including Pandas.

- In our example, we will be using a CSV file called 'data.csv'.

- Download data.csv or Open data.csv

**4.1 Read Sample CSV:**

In [None]:
import pandas as pd

df = pd.read_csv('Images/data.csv')

print(df.to_string())

   Student ID First Name Last Name                  Email  GPA Enrollment Date
0           1       John       Doe     john.doe@email.com  3.7      2020-09-01
1           2       Jane     Smith   jane.smith@email.com  3.9      2020-09-01
2           3        Bob   Johnson  bob.johnson@email.com  3.2      2020-09-01
3           4      Sarah       Lee    sarah.lee@email.com  3.5      2020-09-01
4           5      David      Wong   david.wong@email.com  3.8      2020-09-01


<br>
<img src="images/Image-text.png" alt="Drawing" style="width: 400px;" align="center"/>
<img src="images/Image-csv.png" alt="Drawing" style="width: 600px;" align="center"/>
<br>
<br>

In [74]:
# Print the DataFrame without the to_string() method :

import pandas as pd

df = pd.read_csv('Images/data.csv')

print(df) 

   Student ID First Name Last Name                  Email  GPA Enrollment Date
0           1       John       Doe     john.doe@email.com  3.7      2020-09-01
1           2       Jane     Smith   jane.smith@email.com  3.9      2020-09-01
2           3        Bob   Johnson  bob.johnson@email.com  3.2      2020-09-01
3           4      Sarah       Lee    sarah.lee@email.com  3.5      2020-09-01
4           5      David      Wong   david.wong@email.com  3.8      2020-09-01


**4.2 To check maximum number of rows**
- The number of rows returned is defined in Pandas option settings.

- You can check your system's maximum rows with the pd.options.display.max_rows statement.

In [None]:
# Check the number of maximum returned rows :

import pandas as pd

print(pd.options.display.max_rows) 

60


In [69]:
# Increasing the maximum number of rows to display the entire DataFrame :
import pandas as pd

pd.options.display.max_rows = 9999

df = pd.read_csv('Images/largedata.csv')

print(df) 

     Series_reference   Period  Data_value Suppressed STATUS    UNITS  \
0       BDCQ.SF1AA2CA  2016.06    1116.386        NaN      F  Dollars   
1       BDCQ.SF1AA2CA  2016.09    1070.874        NaN      F  Dollars   
2       BDCQ.SF1AA2CA  2016.12    1054.408        NaN      F  Dollars   
3       BDCQ.SF1AA2CA  2017.03    1010.665        NaN      F  Dollars   
4       BDCQ.SF1AA2CA  2017.06    1233.700        NaN      F  Dollars   
5       BDCQ.SF1AA2CA  2017.09    1282.436        NaN      F  Dollars   
6       BDCQ.SF1AA2CA  2017.12    1290.820        NaN      F  Dollars   
7       BDCQ.SF1AA2CA  2018.03    1412.007        NaN      F  Dollars   
8       BDCQ.SF1AA2CA  2018.06    1488.055        NaN      F  Dollars   
9       BDCQ.SF1AA2CA  2018.09    1497.678        NaN      F  Dollars   
10      BDCQ.SF1AA2CA  2018.12    1570.507        NaN      F  Dollars   
11      BDCQ.SF1AA2CA  2019.03    1393.749        NaN      F  Dollars   
12      BDCQ.SF1AA2CA  2019.06    1517.143        N

**4.3 Read a CSV file and print the first few rows using the ```head()``` method:**
- Using the pandas library in Python to read a CSV file and print the first five rows using the ```head()``` method.
- This code would load the file into a pandas dataframe and print the first five rows to the console. 
- You can adjust the number of rows displayed by changing the argument of the ```head() method, e.g. df.head(10)``` would display the first ten rows.

In [17]:
import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('Images/largedata.csv')

# Print the first 10 rows of the dataframe
print(df.head(10))

  Series_reference   Period  Data_value Suppressed STATUS    UNITS  Magnitude  \
0    BDCQ.SF1AA2CA  2016.06    1116.386        NaN      F  Dollars          6   
1    BDCQ.SF1AA2CA  2016.09    1070.874        NaN      F  Dollars          6   
2    BDCQ.SF1AA2CA  2016.12    1054.408        NaN      F  Dollars          6   
3    BDCQ.SF1AA2CA  2017.03    1010.665        NaN      F  Dollars          6   
4    BDCQ.SF1AA2CA  2017.06    1233.700        NaN      F  Dollars          6   
5    BDCQ.SF1AA2CA  2017.09    1282.436        NaN      F  Dollars          6   
6    BDCQ.SF1AA2CA  2017.12    1290.820        NaN      F  Dollars          6   
7    BDCQ.SF1AA2CA  2018.03    1412.007        NaN      F  Dollars          6   
8    BDCQ.SF1AA2CA  2018.06    1488.055        NaN      F  Dollars          6   
9    BDCQ.SF1AA2CA  2018.09    1497.678        NaN      F  Dollars          6   

                          Subject  \
0  Business Data Collection - BDC   
1  Business Data Collection - BDC 

**4.4 Read a CSV file and print the Last few rows using the ```tail()``` method:**
- The code would load the file into a pandas dataframe and print the last five rows to the console using the ```tail()``` method..
- You can adjust the number of rows displayed by changing the argument of the ```tail() method, e.g. df.tail(10)``` would display the last ten rows.

In [18]:
import pandas as pd

# Read the CSV file into a pandas dataframe
df = pd.read_csv('Images/largedata.csv')

# Print the last five rows of the dataframe
print(df.tail())

     Series_reference   Period  Data_value Suppressed STATUS    UNITS  \
6190     BDCQ.SF8RSCA  2021.09     382.195        NaN      F  Dollars   
6191     BDCQ.SF8RSCA  2021.12     397.184        NaN      F  Dollars   
6192     BDCQ.SF8RSCA  2022.03     493.945        NaN      F  Dollars   
6193     BDCQ.SF8RSCA  2022.06     579.955        NaN      R  Dollars   
6194     BDCQ.SF8RSCA  2022.09     609.161        NaN      F  Dollars   

      Magnitude                         Subject  \
6190          6  Business Data Collection - BDC   
6191          6  Business Data Collection - BDC   
6192          6  Business Data Collection - BDC   
6193          6  Business Data Collection - BDC   
6194          6  Business Data Collection - BDC   

                                                Group    Series_title_1  \
6190  Industry by financial variable (NZSIOC Level 1)  Operating profit   
6191  Industry by financial variable (NZSIOC Level 1)  Operating profit   
6192  Industry by financial v

**4.5 Pandas ```describe()``` method on CSV Files**

- The ```describe()``` method returns description of the data in the DataFrame. 
- If the DataFrame contains numerical data, the description contains these information for each column: <br>
```count``` - The number of not-empty values. <br>
```mean``` - The average (mean) value.

In [22]:
# importing pandas module
import pandas as pd
  
# importing regex module
import re
  
# making data frame
data = pd.read_csv('Images/nba.csv')

  
# removing null values to avoid errors
data.dropna(inplace=True)
  
# list of dtypes to include
include = ['object', 'float', 'int']
  
# calling describe method
desc = data.describe(include=include)
  
# display
desc

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
count,364,364,364.0,364,364.0,364,364.0,364,364.0
unique,364,30,,5,,17,,115,
top,Jeff Green,New Orleans Pelicans,,SG,,6-9,,Kentucky,
freq,1,16,,87,,49,,22,
mean,,,16.82967,,26.615385,,219.785714,,4620311.0
std,,,14.994162,,4.233591,,24.793099,,5119716.0
min,,,0.0,,19.0,,161.0,,55722.0
25%,,,5.0,,24.0,,200.0,,1000000.0
50%,,,12.0,,26.0,,220.0,,2515440.0
75%,,,25.0,,29.0,,240.0,,6149694.0


**4.6 Writing a Pandas DataFrame to CSV file**

To write a Pandas DataFrame to CSV file, we can take the following Steps:

- Create a two-dimensional, size-mutable, potentially heterogeneous tabular data, df.
- Print the input DataFrame.
- Use ```df.to_csv``` to save the values of the DataFrame to a CSV (comma-separated values) file.

In [16]:
import pandas as pd
# Create  DataFrame with below sample values
df = pd.DataFrame(
   {
      "x": [5, 2, 1, 9],
      "y": [4, 1, 5, 10],
      "z": [4, 1, 5, 0]
   }
)

# Write  DataFrame into CSV File
print ("Input DataFrame is:\n", df)
df.to_csv("Images\\test.csv", sep=',')

# Read from CSV to DataFrame
df_csv = pd.read_csv("Images\\test.csv")
print ("CSV DataFrame is:\n", df_csv)


Input DataFrame is:
    x   y  z
0  5   4  4
1  2   1  1
2  1   5  5
3  9  10  0
CSV DataFrame is:
    Unnamed: 0  x   y  z
0           0  5   4  4
1           1  2   1  1
2           2  1   5  5
3           3  9  10  0


<hr>

## 5. Read Excel in Pandas

- A simple way to store big data sets with FORMAT RICH Features is to use xls files (Excel Format files).

**Difference between CSV and XLS file formats**<br>
*The CSV format is a plain text format in which values are separated by commas (Comma Separated Values), while XLS file format is an Excel Sheets binary file format which holds information about all the worksheets in a file, including both content and formatting. 
CSV files contains plain text and is a well known format that can be read by everyone including Pandas.*


**5.1 Install xlrd**<br>

Pandas ```.read_excel``` uses a library called ```xlrd``` internally.

- ```xlrd``` is a library for reading (input) Excel files (.xlsx, .xls) in Python.

- ```xlrd``` can be installed with pip. (pip3 depending on the environment)

> $ pip install xlrd


**Read excel: Python Example**

- Specify the path or URL of the Excel file in the first argument.
- If there are multiple sheets, only the first sheet is used by pandas.
- It reads as DataFrame.

In [1]:
import pandas as pd

df = pd.read_excel('Images/excel-sample.xlsx')

print(df)

   Student ID First Name Last Name                  Email Enrollment Date
0           1       John       Doe     john.doe@email.com      2020-09-01
1           2       Jane     Smith   jane.smith@email.com      2020-09-01
2           3        Bob   Johnson  bob.johnson@email.com      2020-09-01
3           4      Sarah       Lee    sarah.lee@email.com      2020-09-01
4           5      David      Wong   david.wong@email.com      2020-09-01


<br>
<img src="images/Image-excel.png" alt="Drawing" style="width: 600px;" align="center"/>
<br>
<br>

**5.2 Load multiple sheets**
- It is also possible to specify a list in the argumentsheet_name. 
- It is OK even if it is a number of 0 starting or the sheet name.
- The specified number or sheet name is the key key, and the data pandas. 
- The DataFrame is read as the ordered dictionary OrderedDict with the value value.

In [5]:
import pandas as pd


df_sheet_multi = pd.read_excel('Images/excel-sample.xlsx', sheet_name=[0, 'sheet2'])

print("First Sheet at Index 0:")
print(df_sheet_multi[0])
print("Sheet-2:")
print(df_sheet_multi['sheet2'])

print("Datatype Details:")
print(type(df_sheet_multi))
print(type(df_sheet_multi['sheet2']))

First Sheet at Index 0:
   Student ID First Name Last Name                  Email Enrollment Date
0           1       John       Doe     john.doe@email.com      2020-09-01
1           2       Jane     Smith   jane.smith@email.com      2020-09-01
2           3        Bob   Johnson  bob.johnson@email.com      2020-09-01
3           4      Sarah       Lee    sarah.lee@email.com      2020-09-01
4           5      David      Wong   david.wong@email.com      2020-09-01
Sheet-2:
   Student ID First Name Last Name  GPA
0           1       John       Doe  9.6
1           2       Jane     Smith  8.8
2           3        Bob   Johnson  7.0
3           4      Sarah       Lee  9.1
4           5      David      Wong  8.9
Datatype Details:
<class 'dict'>
<class 'pandas.core.frame.DataFrame'>


**5.3 Display List of Columns Headers of the Excel Sheet**
- We can get the list of column headers using the columns property of the dataframe object.

In [10]:
import pandas as pd


df_sheet_multi = pd.read_excel('Images/excel-sample.xlsx', sheet_name=[0, 'sheet2'])

print("Column Headers of the Excel Sheet:")
print(df_sheet_multi[0].columns.ravel())

Column Headers of the Excel Sheet:
['Student ID' 'First Name' 'Last Name' 'Email' 'Enrollment Date']


  print(df_sheet_multi[0].columns.ravel())


**5.4 Pandas ```read_excel() usecols``` example**
- We can specify the column names to be read from the excel file. 
- It’s useful when you are interested in only a few of the columns of the excel sheet.

In [11]:
import pandas

excel_data_df = pandas.read_excel('Images/excel-sample.xlsx', sheet_name='sheet2', usecols=['First Name', 'Last Name', 'GPA'])
print(excel_data_df)

  First Name Last Name  GPA
0       John       Doe  9.6
1       Jane     Smith  8.8
2        Bob   Johnson  7.0
3      Sarah       Lee  9.1
4      David      Wong  8.9


<hr>

# Homework Questions

1) Create a dataframe for the below array
```
     x = ['Python', 'Pandas','Java']
```

2) Using Pandas library, create a DataFrame from Series named 's'

3) Create a DataFrame from Dict of Series
```
   info1 = {'one' : pd.Series([11, 22, 33, 44, 65], index=['a', 'b', 'c', 'd', 'e']),   
            'two' : pd.Series([11, 62, 73, 43, 25, 46], index=['a', 'b', 'c', 'd', 'e', 'f'])}  
```

4) Load a csv files into dataframe

5) Check the max_rows of the csv file

6) Increase the maximum number of rows to display the entire DataFrame.

7) Write a code to Slice rows from a DataFrame

8) Add a new column to an existing DataFrame

9) Using Pandas library, delete a column named 'Age' from a DataFrame named 'df'.
```
        data = {'Name': ['John', 'Mike', 'Sara', 'David'],
        'Age': [25, 30, 28, 35],
        'Gender': ['Male', 'Male', 'Female', 'Male']}
```

10) Using Pandas library, select all the rows and columns where 'Age' is greater than 30 in a DataFrame named 'df'.
```
        data = {'Name': ['John', 'Mike', 'Sara', 'David'],
        'Age': [25, 30, 28, 35],
        'Gender': ['Male', 'Male', 'Female', 'Male']}
```

11) Using Pandas library, Read Excel File without Header Row

12. Concatenate the following two pandas dataframes:
   ```
         Name  Age  Gender
      0  John   25    Male
      1  Jane   30  Female
      2  Mark   20    Male
   ```
         and

   ```   
         Name  Age Gender
      0  Lily 
   ``` 


<hr>

For homework solutions please refer to the ```HomeworkSolutions.ipynb``` file