# Python NumPy and Pandas

**What we cover:**
1. Python NumPy library
    - Importing NumPy library
    - Exploring NumPy mathematical and statistical operations
2. Python Pandas library
    - Importing Pandas library
    - Exploring Pandas data structures
        - Series
        - DataFrames
    - Importing data from CSV files using Pandas
    - Importing data from EXCEL files using Pandas
    - Pandas groupby() method
    - Subsetting or Slicing in Pandas 
        - using square brackets []
        - logical/boolean method
        - loc[] method
        - iloc[] method
    - Dealing with missing data in Pandas
    - Dealing with duplicate data in Pandas
    - Introduction to Joins in Pandas
    - Introduction to Concatenation in Pandas


</Br></Br></Br></Br></Br></Br></Br></Br>

### NumPy Introduction

<img src="pics/numpy.png" alt="NumPy"
	title="NumPy" 
    width="350" 
    height="200"
    align="center"/>



- NumPy is the core library for scientific computing in Python
- NumPy provides multidimensional array objects, and tools for working with these arrays
- NumPy stands for Numerical Python

### NumPy "Array" data object

- NumPy provide **array** data objects (data structure) that are up to 50x faster than Python lists

- NumPy arrays are called as **"ndarray"** (n dimensional array)

- Arrays are homogeneous multidimensional structures

- Arrays are grid of values, sharing the same data type, indexed by a tuple of non-negative integers

- NumPy dimensions are called axes

<img src="pics/numpy_array.png?modified=1234567" alt="NumPy array"
	title="NumPy array" 
    width="550" 
    height="400"
    align="center"/>

#### Some useful reference:

[NumPy Cheat Sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf)

In [None]:
# following code can be used to install Numpy into jupyter notebook environment

#!pip install numpy

In [None]:
# Once NumPy is installed, it can be imported to jupyter notebook by using "import" keyword
# usually NumPy is imported under the alias of "np"
# this is mainly done due to ease of referencing

import numpy as np

In [None]:
# Current installed NumPy version can be checked by using the following code

np.__version__

In [None]:
# NumPy ndarray object can be created by using the array() function
# Let's create a 1D array

myarray1 = np.array([11,32,45,7])
myarray1

In [None]:
# shape attribute returns the number of dimensions found in the array data object
# The example below returns (4,) - which means that the array has 1 dimension, and 4 elements in that dimension

myarray1.shape

In [None]:
# Slicing a NumPy 1D Array
# [] - square brackets are used while slicing


myarray1[1:3]

In [None]:
# Let's create a 2D array

myarray2 = np.array([[11,32,45,7], [88,95,22,13]])
myarray2

In [None]:
# shape attribute returns the number of dimensions found in the array data object
# The example below returns (2,4) - which means that the array has 2 dimensions --> 2 Rows and 4 Columns

myarray2.shape

In [None]:
# Slicing a NumPy 2D Array
# [] - square brackets are used while slicing

myarray2[1, 1:3]

In [None]:
# The reshape() method chnages the shape of the array without changing its underlying data
# below changes the (2, 4) 2D-array to a (4,2) 2D-array

myarray2.reshape(4,2)

In [None]:
# arange() method returns an ndarray object containing evenly spaced values
# np.arange(start, stop, step-size) 

myarray3 = np.arange(1,10,2)
myarray3

#### NumPy contains a large number of mathematical  and statistical operations

[Math functions](https://numpy.org/doc/stable/reference/routines.math.html)

[Statistical functions](https://numpy.org/doc/stable/reference/routines.statistics.html)

- In the following section let's explore few of these commonly used functions

In [None]:
myarray4 = np.array([13, 23, 55, 67, 49])

# calculate the total
np.sum(myarray4)

In [None]:
# calculate the median, 50th percentile, 2nd quartile

np.median(myarray4)

In [None]:
# calculate the median, 50th percentile, 2nd quartile

np.quantile(myarray4, 0.5)

In [None]:
# calculate standard deviation

np.std(myarray4)

In [None]:
# calculate square root

np.sqrt(myarray4)

### Pandas Introduction

<img src="pics/pandas.png" alt="pandas"
	title="pandas" 
    width="350" 
    height="200"
    align="center"/>
    

- Pandas is a high-level data manipulation library. It is built on the NumPy package and provides two valuable data structures: **Series** and **DataFrame**

- DataFrame allow you to store and manipulate tabular data in rows and columns

### Pandas data structures

- Pandas Series is a one-dimensional labelled array capable of holding any data type with axis labels or index. An example of a Series object is one column from a DataFrame. Indexing in a panda’s series starts from 0

- Pandas DataFrame is a 2-dimensional labelled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dictionary of Series objects. It is generally the most commonly used pandas’ object. Like Series, DataFrame accepts many different kinds of inputs. Indexing in a panda DataFrame starts from 0 (for both rows and columns)

<img src="pics/pandas_series_and_dataframe.png?modified=12345678" alt="series_and_dataframe"
	title="series_and_dataframe" 
    width="550" 
    height="400"
    align="center"/>

<br><br>

#### Some useful reference:

[Pandas Cheat Sheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf)

In [None]:
# following code can be used to install Pandas into Jupyter notebook environment

#!pip install pandas

In [None]:
# Once Pandas is installed, it can be imported into Jupyter notebook by using "import" keyword
# usually Pandas is imported under the alias of "pd"
# this is mainly done due to ease of referencing

import pandas as pd

In [None]:
# Current installed Pandas version can be checked by using the following code

pd.__version__

In [None]:
# Below code showcases creation of a Series
# Series is a type of a list in pandas which can take integer values, string values, double values and more 
# Pandas Series index stats from 0

emp = pd.Series(["Matt","Ned","John", "Sam"]) 
emp

In [None]:
# Exploring pandas series data object type

type(emp)

In [None]:
# Pandas Series index stats from 0, use [] square bracket technique while slicing

emp[0:2]

In [None]:
# Exploring length of the pandas series

#len(emp)

emp.size

#### Question

<img src="pics/pandas_series_design.png" alt="pandas Series"
	title="pandas Series" 
    width="100" 
    height="200"
    align="left"/>
<br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>

 - Write down the code to design the above Pandas Series data structure
 - Save the series as "age1"

In [None]:
# Write down the answer here:



In [None]:
# Below code showcases creation of a Dataframe
# Pandas DataFrame is a 2-dimensional labelled data structure with columns of potentially different types
# below code showcases how to use a dictionary (key-values) to manually create a Dataframe

df = pd.DataFrame({"SID":[111,112,113], "SName":["Dar","Nedum","Don"], "Score":[85,95,90]})
df

In [None]:
# Exploring pandas Dataframe object type

type(df)

#### Question

<img src="pics/pandas_dataframe_design.png?modified=12345678" alt="pandas"
	title="pandas" 
    width="350" 
    height="200"
    align="left"/>

<br/><br/>
<br/><br/>
<br/><br/>
 - Write down the code to design the above Dataframe structure
 - Save the Dataframe as "car1"

In [None]:
# Write down the answer here:



### Importing Data with read_csv()

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

In [None]:
# read_csv() function from pandas is used while importing data from CSV files
# read_csv() function imports the as a Dataframe object
# Pandas has now become the industry norm while working with tabular data structures

house = pd.read_csv("HousePrices.csv")
house

In [None]:
# head() can used to explore the first 5 records of the dataframe (Indexing starts from 0)

house.head()

In [None]:
# tail() can be used to explore the last 5 records of the dataframe (Indexing starts from 0)

house.tail()

In [None]:
# The describe() method is used for calculating summary statistics such as 
# count, mean, percentiles and std of the numerical values of a Series or DataFrame

house.describe()

In [None]:
# info() method prints information about a DataFrame including the index dtype and columns, non-null values, etc

house.info()

### Importing Data with read_excel()

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

In [None]:
# read_excel() function from pandas is used while importing data from excel files
# read_excel() function imports the as a Dataframe object
# Pandas has now become the industry norm while working with tabular data structures

churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
churn.describe()

In [None]:
churn.info()

#### Question

 - Write down the code to import **"Superstore.xlsx"**
 - Save the Dataframe as **"superstore"**
 - Display the first 8 rows in the **"superstore"** dataset
 - Explore the **"superstore"** dataset by using describe() and info()

In [None]:
# Write down the answer here:

# importing superstore dataset

# display the first 8 rows in the "superstore" dataset


### Pandas groupby() method

| Function | Description   |
|:- |:- |
|count	|Number of non-null observations|
|sum	|Sum of values|
|mean	|Mean of values|
|mad	|Mean absolute deviation|
|median	|Arithmetic median of values|
|min	|Minimum|
|max	|Maximum|
|mode	|Mode|
|abs	|Absolute Value|
|prod	|Product of values|
|std	|Unbiased standard deviation|
|var	|Unbiased variance|
|sem	|Unbiased standard error of the mean|
|skew	|Unbiased skewness (3rd moment)|
|kurt	|Unbiased kurtosis (4th moment)|
|quantile	|Sample quantile (value at %)|
|cumsum	|Cumulative sum|
|cumprod	|Cumulative product|
|cummax	|Cumulative maximum|
|cummin	|Cumulative minimum|

<br><br>

#### Some useful reference:

[Pandas groupby() info](https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#descriptive-statistics)

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's use the churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# groupby() method can be used to group large amounts of data and compute operations on these groups

churn.groupby(by="Gender").count()

In [None]:
# groupby() method can be used to group large amounts of data and compute operations on these groups

churn.groupby(by="Gender").mean()

In [None]:
# groupby() method can be used to group large amounts of data and compute operations on these groups

churn.groupby(by="Gender").median()

In [None]:
# Grouping by multiple columns
# To extend groupby() to work with multiple grouping variables, 
# pass a list of column names to groupby() instead of a single string value

churn.groupby(by=["Gender","PaymentMethod"]).median()

In [None]:
# The aggregation functionality provided by the agg() function allows multiple statistics to be calculated. 
# Instructions for aggregation are provided in the form of a python dictionary. 
# The dictionary keys are used to specify the columns upon which you would like to perform calculations, 
# and the dictionary values to specify the function to run (mean, min, max, etc)

# agg() function example1

churn.groupby(by=["Gender","PaymentMethod"]).agg({"MonthlyCharges": ['mean', 'min', 'max']})

In [None]:
# agg() function example2

churn.groupby(by=["Gender","PaymentMethod"]).agg({"MonthlyCharges": ['mean', 'min', 'max'], "TotalCharges": ['mean', 'min', 'max']})

#### Question

 - Refer to the **churn** dataset
 - Group the data by **"Contract"**, **"Churn"**, **"Gender"**, and calculate count, mean, median and for **"MonthlyCharges"** and **"TotalCharges"**


In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# Write down the answer here:



### Subsetting or Slicing in Pandas

We will be exploring four slicing techniques:
1. Using square brackets []
2. Using logical (boolean) expressions method
3. Using loc[ ] method
4. Using iloc[ ] method

### 1. Selecting columns using square brackets [ ]

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's use the churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# Slicing a pandas Dataframe
# Select columns using [] square brackets
# With square brackets, you can select one or more columns
# With single set of [] square brackets, slicing output will be a pandas series

churn["Tenure"]

In [None]:
# With double set of [] square brackets, slicing output will be a pandas Dataframe

churn[["Tenure"]]

In [None]:
# To Select multiple columns use double set of [] square brackets
# output is a pandas Dataframe

churn[["Gender","Tenure"]]

### 2. Selecting rows and columns using logical (boolean) expressions method

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's use the churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# The below slicing is done through logical (boolean) expressions method
# The second set of [] square brackets will be used to define the logical expression (churn.Tenure>40) to filter the rows
# The below code will output all rows where "Tenure" is greater than 40 and "Gender", "Tenure" columns

churn[churn.Tenure>40][["Gender","Tenure"]]

In [None]:
# The below code will output all rows where "Tenure" is greater than 40 OR "Gender" is male, and "Gender", "Tenure" columns

churn[(churn.Tenure>40) | (churn.Gender == "Male")][["Gender","Tenure"]]

In [None]:
# The below code will output all rows where "Tenure" is greater than 40 AND "Gender" is male, and "Gender", "Tenure" columns

churn[(churn.Tenure>40) & (churn.Gender == "Male")][["Gender","Tenure"]]

#### Question

 - Refer to the **churn** dataset
 - Extract **"Contract"**, **"Churn"**, **"Gender"**, **"MonthlyCharges"**, **"TotalCharges"** columns
  - Extract only the records where **"Churn"** is "Yes", and **"Gender"** is "Female"


In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# Write down the answer here:



### 3. Selecting rows and columns using loc[ ] method
- The pandas attribute .loc will allow you to select rows and columns in the following fasion

<img src="pics/pandasloc.png?modified=12345678" alt="pandasloc"
	title="pandasloc" 
    width="550" 
    height="400"
    align="center"/>



In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's use the churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# The below code will output the first 5 rows and "Gender", "Tenure" columns

churn.loc[0:4 , ["Gender", "Tenure"]]

In [None]:
# The below code will output all rows and "Gender", "Tenure" columns

churn.loc[ : , ["Gender", "Tenure"]]

In [None]:
# The below code will output all rows where "Tenure" is greater than 40 and "Gender", "Tenure" columns

churn.loc[churn.Tenure>40 , ["Gender","Tenure"]]

In [None]:
# The below code will output all rows where "Tenure" is greater than 40 OR "Gender" is male, and "Gender", "Tenure" columns

churn.loc[(churn.Tenure>40) | (churn.Gender == "Male") , ["Gender","Tenure"]]

In [None]:
# The below code will output all rows where "Tenure" is greater than 40 AND "Gender" is male, and "Gender", "Tenure" columns

churn.loc[ (churn.Tenure>40) & (churn.Gender == "Male") , ["Gender","Tenure"]]

#### Question

 - Refer to the **churn** dataset
 - Extract **"Contract"**, **"Churn"**, **"Gender"**, **"MonthlyCharges"**, **"TotalCharges"** columns
 - Extract only the records where **"Churn"** is "Yes", and **"Gender"** is "Female"


In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# Write down the answer here:


### 4. Selecting rows and columns using iloc[ ] method
- The pandas attribute .iloc will allow you to select rows and columns in the following fasion

<img src="pics/pandasiloc.png?modified=12345678" alt="pandasiloc"
	title="pandasiloc" 
    width="550" 
    height="400"
    align="center"/>

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's use the churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# The below code will output the first 5 rows and first 3 columns

churn.iloc[0:4 , 0:3]

In [None]:
# The below code will output the first 5 rows and "Gender", "Tenure" columns

churn.iloc[0:4 , [1,4]]

In [None]:
# The below code will output all rows and "Gender", "Tenure" columns

churn.iloc[ : , [1,4]]

### Dealing with missing data

- Missing Data can occur when no information is provided 
- In Pandas missing data is represented by two value: **None** and **NaN** (an acronym for Not a Number), None and NaN as essentially interchangeable for indicating missing or null values 

There are several useful functions for detecting, removing, and replacing null values in Pandas DataFrame. These function can also be used in Pandas Series:

- isnull()
- notnull()
- dropna()
- fillna()
- replace()
- interpolate()

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's use the churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# isnull() function returns the Boolean form of dataframe, which will indicate True or False for all values

pd.isnull(churn)

In [None]:
# creating boolean series, True or False values for information in "Tenure" column

bool_series = pd.isnull(churn["Tenure"])
bool_series

In [None]:
# filtering data to displaying only with "Tenure" = NaN (True)
churn[bool_series]

In [None]:
# dropping all missing(NaN) values from the dataframe using dropna() function   
churn.dropna()

In [None]:
# filling in missing values with a zero, using fillna() 

churn.fillna(0, inplace=False)

In [None]:
# filling a missing value with previous value

churn.fillna(method ='ffill', inplace=False)

In [None]:
# filling a missing value with after value

churn.fillna(method ='bfill', inplace=False) 

In [None]:
# replacing all NaN values in the dataframe with with zeros

churn.replace(to_replace = np.nan, value = 0, inplace=False)

In [None]:
# interpolate missing values using linear regression

churn.interpolate(method ='linear', limit_direction ='forward', inplace=False) 

#### Question

Using the appropriate function filter and display only the whole records (not null) in the **“churn”** dataset

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's use the churn dataset
churn = pd.read_excel("Telco Customer Churn - Training Dataset.xlsx", sheet_name="Telco Customer Churn")
churn.head()

In [None]:
# Write down the answer here:



### Dealing with Duplicate Data in Pandas
A common task in data analysis is dealing with duplicate values.
- duplicated() function can detect duplicates.
- drop_duplicates() function can remove duplicate rows.
- Let's explore these techniques.

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's use the CustomerInfo dataset
ci = pd.read_excel("CustomerInfo5.xlsx")
ci

In [None]:
# Let’s use duplicated() function to identify how many duplicate records there are in the dataset
ci.duplicated().sum()

In [None]:
# Let’s use drop_duplicates() function to remove all the duplicates from the dataset
ci_new = ci.drop_duplicates()
ci_new

#### Question
- Refer to the **"SalesInfoApril"** dataset
- Identify how many duplicates there are in the dataset
- Using suitable functions remove all duplicate rows
- Save this information into a new dataset called **"NewSalesInfoApril"**


In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's import the "CustomerInfo" dataset
SalesInfoApril = pd.read_excel("SalesInfoApril.xlsx")
SalesInfoApril

In [None]:
# Write down the answer here:



### Introduction to Joins in Pandas

A join is used to combine rows from two tables (TableA and TableB), based on a common column between them. 
<br>
Below outlines the four main types of Joins used during data manipulation activities:


**Left Join():**
return all rows from TableA, and all columns from TableA and TableB. 
Rows in TableA with no match in TableB will have "NA" values in the new columns. 
If there are multiple matches between TableA and TableB, all combinations of the matches are returned.
 
**Right Join():**
return all rows from TableB, and all columns from TableA and TableB. 
Rows in TableB with no match in TableA will have NA values in the new columns. 
If there are multiple matches between TableA and TableB, all combinations of the matches are returned.

**Inner Join():**
return all rows from TableA where there are matching values in TableB, and all columns from TableA and TableB. 
If there are multiple matches between TableA and TableB, all combination of the matches are returned.
 
**Full Join():**
return all rows and all columns from both TableA and TableB. 
Where there are no matching values, returns "NA".
<br>

<img src="pics/Pythonjoins.png?modified=12345678" alt="Pythonjoins"
	title="Pythonjoins" 
    width="350" 
    height="400"
    align="center"/>

In [None]:
# For this lab, let’s use "CustomerInfo" and "OrderInfo" datasets. 
# Let’s combine these two datasets in a meaningful manner

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Let's import the "CustomerInfo" dataset
CustomerInfo = pd.read_excel("CustomerInfo.xlsx")
print(CustomerInfo.head())

# Let's import the "OrderInfo" dataset
OrderInfo = pd.read_excel("OrderInfo.xlsx")
print(OrderInfo.head())

In [None]:
# Let’s combine the datasets

# Let's do a left join, this is defined by how="left" 
# use "CustomerID" and "CusID" columns as the common column

left_join_df = pd.merge(CustomerInfo, 
                        OrderInfo, 
                        how="left", 
                        left_on="CustomerID", 
                        right_on="CusID")


left_join_df.head()

In [None]:
# Let's do a right join, this is defined by how="right" 
# use "CustomerID" and "CusID" columns as the common column

right_join_df = pd.merge(CustomerInfo, 
                        OrderInfo, 
                        how="right", 
                        left_on="CustomerID", 
                        right_on="CusID")


right_join_df.head()

In [None]:
# Let's do a inner join, this is defined by how="inner" 
# use "CustomerID" and "CusID" columns as the common column

inner_join_df = pd.merge(CustomerInfo, 
                        OrderInfo, 
                        how="inner", 
                        left_on="CustomerID", 
                        right_on="CusID")


inner_join_df.head()

In [None]:
# Let's do a full join, this is defined by how="outer" 
# use "CustomerID" and "CusID" columns as the common column

full_join_df = pd.merge(CustomerInfo, 
                        OrderInfo, 
                        how="outer", 
                        left_on="CustomerID", 
                        right_on="CusID")


full_join_df.head()

### Introduction to  Concatenation in Pandas
- Concatenation combines or appends all rows from the tables, also known as “Set Operations” or “Unions”.
- Concatenated data may sometimes contain duplicates, with additional functionalities duplicate can be removed.

In [None]:
# For this lab, let’s use "CustomerInfo" and "CustomerInfo2" datasets. 

# import Pandas library with the alias of "pd"
import pandas as pd

# Let's import the "CustomerInfo" dataset
CustomerInfo = pd.read_excel("CustomerInfo.xlsx")
print(CustomerInfo.head())

# Let's import the "CustomerInfo2" dataset
CustomerInfo2 = pd.read_excel("CustomerInfo2.xlsx")
print(CustomerInfo2.head())

In [None]:
# Let’s concatenate "CustomerInfo" and "CustomerInfo2" datasets in a meaningful manner
# Please note the output contain duplicate rows

all_rows = pd.concat([CustomerInfo, CustomerInfo2])
all_rows

In [None]:
# let's remove duplicates using drop_duplicates()

all_rows.drop_duplicates()

#### Question

- Refer to the **"SalesInfoJan"**, **"SalesInfoFeb"** and  **"ProductInfo"** datasets.
- Using appropriate pandas functons combine **"SalesInfoJan"** and **"SalesInfoFeb"** datasets
- Save this information into a new dataset called **"SalesInfo"**
- Now merge/join **"SalesInfo"** with  **"ProductInfo"** dataset
- Save this information into a new dataset called **"AllSales"**

In [None]:
# import Pandas library with the alias of "pd"
import pandas as pd

# Importing all datasets
SalesInfoJan = pd.read_excel("SalesInfoJan.xlsx")
SalesInfoFeb = pd.read_excel("SalesInfoFeb.xlsx")
ProductInfo = pd.read_excel("ProductInfo.xlsx")

print(SalesInfoJan.head())

print(SalesInfoFeb.head())

print(ProductInfo.head())

In [None]:
# Write down the answer here:

