# What is Pandas?
Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

# Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.


``Data Science``: is a branch of computer science where we study how to store, use and analyze data for deriving information from it.


# What Can Pandas Do?
Pandas gives you answers about the data. Like:

Is there a correlation between two or more columns?
What is average value?
Max value?
Min value?
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.



# Pandas Getting Started
Installation of Pandas
If you have Python and PIP already installed on a system, then installation of Pandas is very easy.

Install it using this command:

`C:\Users\Your Name>pip install pandas`


If this command fails, then use a python distribution that already has Pandas installed like, Anaconda, Spyder etc.

# Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:



In [1]:
import pandas

Now Pandas is imported and ready to use.

# Example

In [62]:
import pandas

Nigeria_election_result = {
    
    'Name' : ['Bola Amed Tinubu', 'Peter Obi', 'Atiku Abubaka', 'Kwankwanso'],
    'Party' : ['APC', 'Labour', 'PDP', 'NNPP'],
    'Vote count' : [8000000, 6000000, 7000000, 2000000],
    "School": ["OAU", "Unilag", "Uniosun", "Covenant"]
}

In [59]:
Nigeria_election_result = pandas.DataFrame(Nigeria_election_result)

In [60]:
Nigeria_election_result

Unnamed: 0,Name,Party,Vote count,School
0,Bola Amed Tinubu,APC,8000000,OAU
1,Peter Obi,Labour,6000000,Unilag
2,Atiku Abubaka,PDP,7000000,Uniosun
3,Kwankwanso,NNPP,2000000,Covenant


# Pandas as pd
Pandas is usually imported under the `pd` alias.

alias: In Python alias are an alternate name for referring to the same thing.

Create an alias with the as keyword while importing:

In [5]:
import pandas as pd

Now the Pandas package can be referred to as pd instead of pandas.

# Example

In [6]:
import pandas as pd

Nigeria_election_result = {
    
    'Name' : ['Bola Amed Tinubu', 'Peter Obi', 'Atiku Abubaka', 'Kwankwanso'],
    'Party' : ['APC', 'Labour', 'PDP', 'NNPP'],
    'Vote count' : [8000000, 6000000, 7000000, 2000000],
}

In [7]:
Nigeria_election_result = pd.DataFrame(Nigeria_election_result)

In [8]:
Nigeria_election_result

Unnamed: 0,Name,Party,Vote count
0,Bola Amed Tinubu,APC,8000000
1,Peter Obi,Labour,6000000
2,Atiku Abubaka,PDP,7000000
3,Kwankwanso,NNPP,2000000


# Checking Pandas Version
The version string is stored under __version__ attribute.

# Example

In [9]:
import pandas as pd

print(pd.__version__)

1.4.4


# Pandas Series and Index
Example

In [10]:
import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

myvar

x    1
y    7
z    2
dtype: int64

Example

In [11]:
import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)

day1    420
day2    380
day3    390
dtype: int64


# Example
Create a DataFrame from two Series:

In [12]:
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

myvar = pd.DataFrame(data)

myvar

Unnamed: 0,calories,duration
0,420,50
1,380,40
2,390,45


# Load Files Into a DataFrame
If your data sets are stored in a file, Pandas can load them into a DataFrame.

`Example`
Load a comma separated file (CSV file) into a DataFrame:

In [13]:
import pandas as pd

df = pd.read_csv('Sales 3.csv')

df.head()

Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue
0,2013-11-26,26,November,2013,19,Youth (<25),M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8,45,120,590,360,950
1,2015-11-26,26,November,2015,19,Youth (<25),M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8,45,120,590,360,950
2,2014-03-23,23,March,2014,49,Adults (35-64),M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,23,45,120,1366,1035,2401
3,2016-03-23,23,March,2016,49,Adults (35-64),M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,20,45,120,1188,900,2088
4,2014-05-15,15,May,2014,47,Adults (35-64),F,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,4,45,120,238,180,418


In [15]:
df["Revenue"].max()

58074

In [16]:
df["Revenue"].min()

2

In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 113036 entries, 0 to 113035
Data columns (total 18 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   Date              113036 non-null  object
 1   Day               113036 non-null  int64 
 2   Month             113036 non-null  object
 3   Year              113036 non-null  int64 
 4   Customer_Age      113036 non-null  int64 
 5   Age_Group         113036 non-null  object
 6   Customer_Gender   113036 non-null  object
 7   Country           113036 non-null  object
 8   State             113036 non-null  object
 9   Product_Category  113036 non-null  object
 10  Sub_Category      113036 non-null  object
 11  Product           113036 non-null  object
 12  Order_Quantity    113036 non-null  int64 
 13  Unit_Cost         113036 non-null  int64 
 14  Unit_Price        113036 non-null  int64 
 15  Profit            113036 non-null  int64 
 16  Cost              113036 non-null  int

In [19]:
df.isnull().sum()

Date                0
Day                 0
Month               0
Year                0
Customer_Age        0
Age_Group           0
Customer_Gender     0
Country             0
State               0
Product_Category    0
Sub_Category        0
Product             0
Order_Quantity      0
Unit_Cost           0
Unit_Price          0
Profit              0
Cost                0
Revenue             0
dtype: int64

In [20]:
df.columns

Index(['Date', 'Day', 'Month', 'Year', 'Customer_Age', 'Age_Group',
       'Customer_Gender', 'Country', 'State', 'Product_Category',
       'Sub_Category', 'Product', 'Order_Quantity', 'Unit_Cost', 'Unit_Price',
       'Profit', 'Cost', 'Revenue'],
      dtype='object')

In [24]:
df["Product_Category"].unique()

array(['Accessories', 'Clothing', 'Bikes'], dtype=object)

In [45]:
df["Product_Category"].value_counts()

Accessories    70120
Bikes          25982
Clothing       16934
Name: Product_Category, dtype: int64

In [30]:
df["Year"].unique()

array([2013, 2015, 2014, 2016, 2012, 2011])

In [33]:
df["Year"].value_counts()

2014    29398
2016    29398
2013    24443
2015    24443
2012     2677
2011     2677
Name: Year, dtype: int64

In [34]:
df["Country"].unique()

array(['Canada', 'Australia', 'United States', 'Germany', 'France',
       'United Kingdom'], dtype=object)

In [35]:
df["Country"].value_counts()

United States     39206
Australia         23936
Canada            14178
United Kingdom    13620
Germany           11098
France            10998
Name: Country, dtype: int64

In [36]:
df["Profit"].max()

15096

In [37]:
df[df["Profit"]==15096]

Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue
112073,2015-07-24,24,July,2015,52,Adults (35-64),M,Australia,Queensland,Clothing,Vests,"Touring-1000 Yellow, 50",29,1482,2384,15096,42978,58074


In [38]:
df["Profit"].min()

-30

In [39]:
df[df["Profit"]==-30]

Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue
48571,2015-12-17,17,December,2015,27,Young Adults (25-34),F,France,Yveline,Clothing,Jerseys,"Short-Sleeve Classic Jersey, XL",31,42,54,-30,1302,1272


In [41]:
profit = 1272 - 1302

profit

-30

In [43]:
df["Age_Group"].value_counts()

Adults (35-64)          55824
Young Adults (25-34)    38654
Youth (<25)             17828
Seniors (64+)             730
Name: Age_Group, dtype: int64

Method 1: Plot Value Counts in Descending Order df. my_column. value_counts(). plot(kind='bar')


In [52]:
df.Month.value_counts()

June         11234
December     11200
May          11128
April        10182
March         9674
January       9284
February      9022
October       8750
November      8734
August        8200
September     8166
July          7462
Name: Month, dtype: int64

In [54]:
df["Revenue"].max()

58074

In [55]:
df[df["Revenue"]== 58074]

Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue
112073,2015-07-24,24,July,2015,52,Adults (35-64),M,Australia,Queensland,Clothing,Vests,"Touring-1000 Yellow, 50",29,1482,2384,15096,42978,58074


In [56]:
df.Revenue.min()

2

In [57]:
df[df["Revenue"]==2]

Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue
74663,2013-08-08,8,August,2013,19,Youth (<25),F,United States,California,Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
74664,2013-08-08,8,August,2013,19,Youth (<25),F,United States,California,Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
74666,2015-08-08,8,August,2015,19,Youth (<25),F,United States,California,Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
74667,2015-08-08,8,August,2015,19,Youth (<25),F,United States,California,Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
74928,2016-03-23,23,March,2016,45,Adults (35-64),M,United States,California,Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
107705,2015-12-16,16,December,2015,28,Young Adults (25-34),M,United States,Washington,Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
107947,2015-10-21,21,October,2015,29,Young Adults (25-34),M,Australia,New South Wales,Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
107952,2015-10-22,22,October,2015,29,Young Adults (25-34),M,Australia,New South Wales,Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
108165,2015-12-14,14,December,2015,64,Adults (35-64),F,France,Seine (Paris),Accessories,Tires and Tubes,Patch Kit/8 Patches,1,1,2,1,1,2
