# Pandas
`Pandas` is an open-source python library/module providing high performance, easy-to-use **data structure** and **data analysis tools** for the Python programming language.

`Pandas` allows you to work with tabular data and provides many helpful methods and functions to help you manipulate and analyze your data.

It can read data from many different sources, such as from the csv files, Excel files,  html, spss, stata, SQL databases etc. Data can be ordered or underordered and supported by different data types. The library also handles missing data incredibly well and allows you to update, insert, and delete data using vectorized formats.

### Installation Pandas Library/Module

In [2]:
!pip install pandas



### Importing Pandas Library/Module

In [3]:
import pandas as pd

## Pandas Data Structure: `Series` or `DataFrame`
Data are stored in two main data structures, **Pandas Series** (`Series`) and **Pandas  Dataframe** (`DataFrame`) objects. These structures can easily be sorted, filtered, and merged with other data.

## `Series`
The Pandas `series` object is a one-dimensional data structure. You can think of this as being comparable to a column in a table. Pandas describes this as a **one-dimensional homogenously-typed array**. This means that the data are aligned along a single aix and are of the same data type.

A Pandas series is a class of data belonging that, well, contains data, is indexed, and has a particular data type. Let’s see how we can create a Pandas series:

In [5]:
# Create a pandas series
sr = pd.Series(['Lagos', 'Kano', 'Imo', 'Edo'])
sr

0    Lagos
1     Kano
2      Imo
3      Edo
dtype: object

In [6]:
type(sr)

pandas.core.series.Series

## `DataFrame`
A Pandas `DataFrame`, put simply, is a **table**. The dataframe contains both rows and columns, each of which are labelled. The DataFrame object contains individual records, each containing different values. Each value in a DataFrame corresponds to both a row (a record) or a column.

In [10]:
# Create a pandas dataframe
df = pd.DataFrame(['Lagos', 'Kano', 'Imo', 'Edo'])
df

Unnamed: 0,0
0,Lagos
1,Kano
2,Imo
3,Edo


#### Creating a Pandas DataFrame
One of the ways in which we can create a **DataFrame** is by passing in a dictionary of data.

In [12]:
df = pd.DataFrame({'State':['Lagos', 'Kano', 'Imo', 'Edo']})
df

Unnamed: 0,State
0,Lagos
1,Kano
2,Imo
3,Edo


In [15]:
d = {'State':['Lagos', 'Kano', 'Imo', 'Edo'], 
     'region':['SW', 'NW', 'SE', 'SS']}
d

{'State': ['Lagos', 'Kano', 'Imo', 'Edo'], 'region': ['SW', 'NW', 'SE', 'SS']}

In [17]:
df1 = pd.DataFrame({'State':['Lagos', 'Kano', 'Imo', 'Edo'], 
     'region':['SW', 'NW', 'SE', 'SS'],
                   'pop':[13, 11, 9, 7]})
df1

Unnamed: 0,State,region,pop
0,Lagos,SW,13
1,Kano,NW,11
2,Imo,SE,9
3,Edo,SS,7


In [18]:
type(df)

pandas.core.frame.DataFrame

## Reading/Importing files with pandas
Creating a DataFrame from scratch can be quite a bit of work, in reality you likely already have files that you import (read) into Python using pandas. You can convert the files into Pandas DataFrame.

### Reading/Importing `.csv` file

In [19]:
# Importing csv file in the same directory
df2 = pd.read_csv('orders.csv')
df2

Unnamed: 0,Customer ID,Order ID,Date,Product,Quantity,UnitPrice
0,3,266868,9/1/2021,Noodles,193,100
1,1,140794,12/1/2021,Toothpaste,52,500
2,3,684759,3/1/2021,Deodorant,121,180
3,3,640447,1/1/2021,Noodles,87,100
4,5,898637,12/1/2021,Soft Drinks,71,100
...,...,...,...,...,...,...
695,3,853295,10/1/2021,Noodles,291,100
696,2,253981,12/1/2021,Toothpaste,57,500
697,5,208456,9/1/2020,Toothpaste,222,500
698,3,727940,9/1/2020,Coffee,66,60


In [23]:
# Importing csv file in a different directory
df3 = pd.read_csv('C:\\Users\\USER\\Documents\\Ridoh\\Computer\\orders.csv')
df3

Unnamed: 0,Customer ID,Order ID,Date,Product,Quantity,UnitPrice
0,3,266868,9/1/2021,Noodles,193,100
1,1,140794,12/1/2021,Toothpaste,52,500
2,3,684759,3/1/2021,Deodorant,121,180
3,3,640447,1/1/2021,Noodles,87,100
4,5,898637,12/1/2021,Soft Drinks,71,100
...,...,...,...,...,...,...
695,3,853295,10/1/2021,Noodles,291,100
696,2,253981,12/1/2021,Toothpaste,57,500
697,5,208456,9/1/2020,Toothpaste,222,500
698,3,727940,9/1/2020,Coffee,66,60


In [25]:
# Importing csv file in a different directory
df4 = pd.read_csv(r'C:\Users\USER\Documents\Ridoh\Computer\orders.csv')  # raw string
df4

Unnamed: 0,Customer ID,Order ID,Date,Product,Quantity,UnitPrice
0,3,266868,9/1/2021,Noodles,193,100
1,1,140794,12/1/2021,Toothpaste,52,500
2,3,684759,3/1/2021,Deodorant,121,180
3,3,640447,1/1/2021,Noodles,87,100
4,5,898637,12/1/2021,Soft Drinks,71,100
...,...,...,...,...,...,...
695,3,853295,10/1/2021,Noodles,291,100
696,2,253981,12/1/2021,Toothpaste,57,500
697,5,208456,9/1/2020,Toothpaste,222,500
698,3,727940,9/1/2020,Coffee,66,60


### Reading/Importing `Excel` file
`.xlsx` or `.xls`

In [26]:
df5 = pd.read_excel('Orders Data.xlsx')
df5

Unnamed: 0,Customer ID,Order ID,Date,Product,Quantity,UnitPrice
0,3,266868,2021-09-01,Noodles,193,100
1,1,140794,2021-12-01,Toothpaste,52,500
2,3,684759,2021-03-01,Deodorant,121,180
3,3,640447,2021-01-01,Noodles,87,100
4,5,898637,2021-12-01,Soft Drinks,71,100
...,...,...,...,...,...,...
695,3,853295,2021-10-01,Noodles,291,100
696,2,253981,2021-12-01,Toothpaste,57,500
697,5,208456,2020-09-01,Toothpaste,222,500
698,3,727940,2020-09-01,Coffee,66,60


In [30]:
df_products = pd.read_excel('Orders Data.xlsx', sheet_name=1)
df_products

Unnamed: 0,Category,Products,Cost per Product
0,Cereal,Noodles,40
1,Cereal,Corn Flakes,60
2,Beverages,Soft Drinks,50
3,Toiletries,Deodorant,70
4,Toiletries,Toothpaste,350
5,Beverages,Coffee,45


In [31]:
df_customers = pd.read_excel('Orders Data.xlsx', sheet_name='Customers')
df_customers

Unnamed: 0,Customer ID,Name,Phone,City,State
0,1,Ndifereke Ebong,8190277292,Ikeja,Lagos
1,2,Chizoba Paul,8044556621,Awka,Anambra
2,3,Opone Gabriel,8022882021,PH,Rivers
3,4,Katibi Zainab,7062782910,VI,Lagos
4,5,Alabi Habeeb,7085164469,Ikorodu,Lagos


### Reading/Importing Other file format

In [32]:
df6 = pd.read_clipboard()
df6

Unnamed: 0,ID,Sales
0,A,10
1,B,50
2,C,20
3,D,15
4,E,30


In [34]:
df7 = pd.read_excel('SALES.xlsx', skiprows=2)
df7

Unnamed: 0,ID,Sales
0,A,10
1,B,50
2,C,20
3,D,15
4,E,30


## Writing/Exporting files with pandas


In [36]:
df7.to_csv('sample.csv', index=False)

In [37]:
df7.to_excel('exercise.xlsx', index=False, sheet_name='mydataframe')