# Python | Pandas DataFrame

### What is Pandas?

<b>pandas</b> is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. 

### What is a Pandas DataFrame?

<b>Pandas DataFrame</b> is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). 

A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. 

Pandas DataFrame consists of three principal components, the data, rows, and columns.

<img src="images/pandas.jpg">

A Pandas DataFrame will be created by loading the datasets from existing storage. Storage can be SQL Database, CSV file, and Excel file. 
Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc.

Dataframe can be created in different ways here are some ways by which we create a dataframe:


### Creating a dataframe using List:

In [3]:
# import pandas as pd
import pandas as pd
 
# list of strings
lyst = ['CSC', '102', 'is', 'the', 'best', 'course', 'ever']
 
# Calling DataFrame constructor on list
df = pd.DataFrame(lyst)

# Print the output.
df

Unnamed: 0,0
0,CSC
1,102
2,is
3,the
4,best
5,course
6,ever


### Creating a dataframe using dict of narray/lists:

In [8]:
import pandas as pd
 
# intialise data of lists.
data = {'Name':['Angela', 'Precious', 'Luis', 'Ade'],
        'Age':[20, 21, 19, 18]}
 
# Create DataFrame
df = pd.DataFrame(data)
 
# Print the output.
df

Unnamed: 0,Name,Age
0,Angela,20
1,Precious,21
2,Luis,19
3,Ade,18


### Column Selection:

In [9]:
# Import pandas package
import pandas as pd
 
# Define a dictionary containing employee data
data = {'Name':['Clem', 'Prince', 'Edward', 'Adele'],
        'Age':[27, 24, 22, 32],
        'Address':['Abuja', 'Kano', 'Minna', 'Lagos'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
 
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
 
# select two columns
df[['Name', 'Qualification']]


Unnamed: 0,Name,Qualification
0,Clem,Msc
1,Prince,MA
2,Edward,MCA
3,Adele,Phd


### Row Selection:
Pandas provide a unique method to retrieve rows from a Data frame.<br>
<i><font color="green">DataFrame.iloc[]</font></i> method is used to retrieve rows from Pandas DataFrame.<br>

In [12]:
import pandas as pd
 
# Define a dictionary containing employee data
data = {'Name':['Oyin', 'Mary', 'David', 'Bola'],
        'Age':[27, 24, 22, 32],
        'Address':['Asaba', 'Maiduguri', 'Onitsha', 'Kwara'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
 
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
 
# select first row
df.iloc[0]

Name              Oyin
Age                 27
Address          Asaba
Qualification      Msc
Name: 0, dtype: object

### Read from a file:

In [18]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("employee_records.csv")

# print excel
data

Unnamed: 0,employee_id,name,job_title,department,email,phone_number,date_of_hiring,salary
0,396941,Tammy Valdez,"Administrator, charities/voluntary organisations",supply-chains,oanderson@gibson.com,512-507-0524x1231,12/05/2019,77744
1,289507,Debbie Castaneda,"Designer, blown glass/stained glass",deliverables,davidbrown@krueger-harper.com,(590)110-8719x53241,18/12/2013,85059
2,500857,James Rodriguez,"Psychologist, counselling",users,laurenwilliams@knapp.com,(983)416-3026x6694,31/05/2014,86053
3,501196,Hunter Brown,Data scientist,action-items,kimberly31@anderson.com,300-921-0488,03/02/2022,59217
4,325944,Jamie Williams,Production manager,communities,david13@smith.com,741.564.4209x04454,23/11/2021,62276
...,...,...,...,...,...,...,...,...
1995,994876,Michael Beck,Human resources officer,functionalities,nicholsonjoseph@hood-spencer.com,+1-690-355-9016x164,03/06/2021,97006
1996,996490,Alexandra Fuller,"Merchandiser, retail",applications,caroline74@rush-blankenship.com,(150)181-0844,30/01/2018,58635
1997,409079,Juan Campbell,Training and development officer,functionalities,anthony31@alvarez.biz,001-117-571-6559x6177,04/12/2014,136633
1998,329310,Jessica Howard,Press photographer,e-business,joshua51@mitchell.net,273.080.8744,02/03/2019,124931


### Select first row from file

In [21]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("employee_records.csv")

df=data.iloc[0]

# print excel
df

employee_id                                                 396941
name                                                  Tammy Valdez
job_title         Administrator, charities/voluntary organisations
department                                           supply-chains
email                                         oanderson@gibson.com
phone_number                                     512-507-0524x1231
date_of_hiring                                          12/05/2019
salary                                                       77744
Name: 0, dtype: object

### Selecting Row with Title Header

In [23]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("bcg.csv")

df=data.head(1)

# print excel
df

Unnamed: 0.1,Unnamed: 0,Study,BCGTB,BCGVacc,NoVaccTB,NoVacc,Latitude,Year
0,1,1,4,123,11,139,44,1948


### Looping over rows and columns
A loop is a general term for taking each item of something, one after another.<br> Pandas DataFrame consists of rows and columns so, in order to loop over dataframe, we have to iterate a dataframe like a dictionary.<br><br>
In order to iterate over rows, we can use two functions <i><font color="green">iteritems(), iterrows() </font></i>. These two functions will help in iteration over rows.

In [25]:
# importing pandas as pd
import pandas as pd
  
# dictionary of lists
dict = {'name':["Abdul", "Chukwuemeka", "Seyi", "Matt"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
 
# creating a dataframe from a dictionary 
df = pd.DataFrame(dict)

# iterating over rows using iterrows() function 
for i, j in df.iterrows():
    print(i, j)
    print()

0 name      Abdul
degree      MBA
score        90
Name: 0, dtype: object

1 name      Chukwuemeka
degree            BCA
score              40
Name: 1, dtype: object

2 name        Seyi
degree    M.Tech
score         80
Name: 2, dtype: object

3 name      Matt
degree     MBA
score       98
Name: 3, dtype: object



### Looping over Columns :
In order to loop over columns, we need to create a list of dataframe columns and then iterating through that list to pull out the dataframe columns.

In [29]:
# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["Bello", "Kamara", "Ugochi", "David"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe from a dictionary 
df = pd.DataFrame(dict)

# creating a list of dataframe columns
columns = list(df)
 
for i in columns:
 
    # printing the third element of the column
    print (df[i][2])

Ugochi
M.Tech
80


### Saving a DataFrame as CSV file

In [34]:
# importing pandas as pd
import pandas as pd
   
# dictionary of lists
records = {'name':["Abel", "Kamsi", "Oyode", "Chinelo"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe from a dictionary 
df = pd.DataFrame(records)

# saving the dataframe
df.to_csv('record.csv')

## Class Project I


####  Go to www.kaggle.com

Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.

#### Download the following dataset:
1. Top Apps in Google Play
2. Cryptocurrency Predict Artificial Intelligence V3
3. Programming Languages Trend Over Time

#### Clue
You can signin with either Google, facebook or Linkedin account

#### Task
Display the first 7 rows of each dataset<br>
Select the first 3 colums of each dataset<br>
Display only one row and header of each dataset


In [35]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("GooglePlay.csv")

df=data.head(1)

# print excel
df

Unnamed: 0.1,Unnamed: 0,App Name,App Id,Category,Developer Id,Developer Website,Developer Email,Content Rating,Ad Supported,In App Purchases
0,1,Google Play services,com.google.android.gms,Tools,Google LLC,https://developers.google.com/android/google-p...,apps-help@google.com,Everyone,False,False


In [55]:
# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("GooglePlay.csv")


# creating a dataframe from a dictionary 
df = pd.DataFrame(data)

# iterating over rows using iterrows() function 
for i, j in df.iterrows():
    print(i,j)
    print()

    # Check if a certain column value reaches a specific value
    if i == 8:
        break

0 Unnamed: 0                                                           1
App Name                                          Google Play services
App Id                                          com.google.android.gms
Category                                                         Tools
Developer Id                                                Google LLC
Developer Website    https://developers.google.com/android/google-p...
Developer Email                                   apps-help@google.com
Content Rating                                                Everyone
Ad Supported                                                     False
In App Purchases                                                 False
Name: 0, dtype: object

1 Unnamed: 0                                                           2
App Name                                                       YouTube
App Id                                      com.google.android.youtube
Category                                       Vi

In [38]:
import pandas as pd

datta = pd.read_csv("GooglePlay.csv")

# creating a dataframe from a dictionary 
df = pd.DataFrame(datta)

# creating a list of dataframe columns
columns = list(df)
 
for i in columns:
    # printing the third element of the column
    print (df[i][0])
    print (df[i][1])
    print (df[i][2])

1
2
3
Google Play services
YouTube
Google
com.google.android.gms
com.google.android.youtube
com.google.android.googlequicksearchbox
Tools
Video Players & Editors
Tools
Google LLC
Google LLC
Google LLC
https://developers.google.com/android/google-play-services/
https://support.google.com/youtube/topic/2422554?rd=1
https://www.google.com/search/about/
apps-help@google.com
ytandroid-support@google.com
apps-help@google.com
Everyone
Teen
Everyone
False
True
True
False
False
False


In [56]:
# importing pandas package
import pandas as pd

data = pd.read_csv("programminglanguage.csv")

# creating a dataframe from a dictionary 
df = pd.DataFrame(data)

# iterating over rows using iterrows() function 
for index, row in df.iterrows():
    print(index,row)
    print()

    # Check if a certain column value reaches a specific value
    if index == 6:
        break

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 459: invalid start byte

In [40]:
# importing pandas package
import pandas as pd

datta = pd.read_csv("programminglanguage.csv")

# creating a dataframe from a dictionary 
df = pd.DataFrame(datta)

# creating a list of dataframe columns
columns = list(df)
 
for i in columns:
    # printing the third element of the column
    print (df[i][0])
    print (df[i][1])
    print (df[i][2])

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 459: invalid start byte

In [1]:
# importing pandas package
import pandas as pd

datta = pd.read_csv("programminglanguage.csv")

df=datta.head(1)

# print excel
df

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 459: invalid start byte

In [43]:
import pandas as pd

data = pd.read_csv("CryptoCurrency.csv")

# creating a dataframe from a dictionary 
df = pd.DataFrame(data)

# iterating over rows using iterrows() function 
for index, row in df.iterrows():
    print(index,row)
    print()

    # Check if a certain column value reaches a specific value
    if index == 6:
        break

0 SYMBOL                       BTC
NAME                     Bitcoin
PRICE(USD)           36000.46553
ID                       bitcoin
RANK                           1
MAXSUPPLY             21000000.0
MARKETCAP(USD)    685214000000.0
Name: 0, dtype: object

1 SYMBOL                       ETH
NAME                    Ethereum
PRICE(USD)           2696.442075
ID                      ethereum
RANK                           2
MAXSUPPLY                    NaN
MARKETCAP(USD)    325420000000.0
Name: 1, dtype: object

2 SYMBOL                     USDT
NAME                     Tether
PRICE(USD)             1.003233
ID                       tether
RANK                          3
MAXSUPPLY                   NaN
MARKETCAP(USD)    83506452939.0
Name: 2, dtype: object

3 SYMBOL                      BNB
NAME                        BNB
PRICE(USD)           378.576567
ID                 binance-coin
RANK                          4
MAXSUPPLY           166801148.0
MARKETCAP(USD)    63147005911.0
Name: 3, d

In [45]:
# importing pandas package
import pandas as pd

datta = pd.read_csv("CryptoCurrency.csv")

# creating a dataframe from a dictionary 
df = pd.DataFrame(datta)

# creating a list of dataframe columns
columns = list(df)
 
for i in columns:
    # printing the third element of the column
    print (df[i][0])
    print (df[i][1])
    print (df[i][2])

BTC
ETH
USDT
Bitcoin
Ethereum
Tether
36000.46553
2696.442075
1.003232632
bitcoin
ethereum
tether
1
2
3
21000000.0
nan
nan
685214000000.0
325420000000.0
83506452939.0


In [44]:
import pandas as pd

datta = pd.read_csv("CryptoCurrency.csv")

df=datta.head(1)
# print excel
df

Unnamed: 0,SYMBOL,NAME,PRICE(USD),ID,RANK,MAXSUPPLY,MARKETCAP(USD)
0,BTC,Bitcoin,36000.46553,bitcoin,1,21000000.0,685214000000.0


## Class Project II

<b>Cadbury Nigeria Plc</b> manufactures and sells branded fast moving consumer goods to the Nigerian market and exports in West Africa. The Company produces intermediate products, such as cocoa butter, liquor, cake and powder. It exports cocoa butter, cake and liquor to international customers, and cocoa powder locally. It operates through three segments: Refreshment Beverages, Confectionery and Intermediate Cocoa Products. The Refreshment Beverages segment includes the manufacture and sale of Bournvita and Hot Chocolate. The Confectionery segment includes the manufacture and sale of Tom Tom and Buttermint. The Intermediate Cocoa Products segment includes the manufacture and sale of cocoa powder, cocoa butter, cocoa liquor and cocoa cake. The Refreshment Beverages' brands include CADBURY BOURNVITA and CADBURY 3-in-1 HOT CHOCOLATE. The Confectionery's brands include TOMTOM CLASSIC, TOMTOM STRAWBERRY and BUTTERMINT. The Intermediate Cocoa Products' brands include COCOA POWDER and COCOA BUTTER.

You have been employed as an expert python developer to create a program to document the consumption categories of their products and brands. Using your knowledge of Pandas DataFrames develop the program that saves the list of products (export, segments and brands) in a .csv excel file.<br><br>
Hint: save the filename as <font color="green"><i>cadbury_market.csv</i></font>.

In [48]:
# Import pandas package
import pandas as pd
 
# Define a dictionary containing employee data
Info = {'Exports':['Cocoa butter', 'Cake', 'Confectionery', "Cocoa Powder"],
        'Refreshment Beverage':['CADBURY BOURNVITA', 'CADBURY BOURNVITA', 'CADBURY 3-in-1 HOT CHOCOLATE', ''],
        'Confectionery':['TOMTOM CLASSIC', 'TOMTOM STRAWBERRY', 'BUTTERMINT', ''],
        'Intermediate Cocoa Products':['COCOA POWDER', 'COCOA BUTTER', '', '']}
# creating a dataframe from a dictionary 
df = pd.DataFrame(Info)

# saving the dataframe
df.to_csv('cadburymarket.csv')