#### Pandas - DataFrames and Series

Pandas is a powerful data manipulation library in python, widely used for data analysis and data cleaning. It provides two primary data structures: Series and DataFrame. 

A Series a one-dimensional array-like object.

A DataFrame is two dimensional, size-mutable, and potentially heterogenous tabular data structure with labelled axes(rows and columns)

In [1]:
import pandas as pd

In [3]:
## Series

## Series -> One-dimensional data structure in python that can hold any data type. It is similar to a column in a table.

data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print(series)
print(type(series))

0    1
1    2
2    3
3    4
4    5
dtype: int64
<class 'pandas.core.series.Series'>


In [4]:
## Create a series from dictionary

data = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
series = pd.Series(data)
print(series)

a    1
b    2
c    3
d    4
e    5
dtype: int64


In [5]:
data = [10, 20, 30, 40, 50]
index = ['a', 'b', 'c', 'd', 'e']

series = pd.Series(data, index = index)
print(series)

a    10
b    20
c    30
d    40
e    50
dtype: int64


In [6]:
## DataFrame

## Create a dataframe from a dictionary of list

data = {
  'Name' : ['Aayush', 'Harvey', 'Mike', 'Louis'],
  'Age': [22, 48, 38, 52],
  'City': ['Pune', 'New York', 'Banglore', 'Noida']
}

df = pd.DataFrame(data)
print(df)

     Name  Age      City
0  Aayush   22      Pune
1  Harvey   48  New York
2    Mike   38  Banglore
3   Louis   52     Noida


In [8]:
## Create a dataframe from a ist of dictionaries

data = [
  {'Name': 'Aayush', 'Age': 22, 'City': 'Pune'},
  {'Name': 'Harvey', 'Age': 48, 'City': 'New York'},
  {'Name': 'Mike', 'Age': 38, 'City': 'Banglore'},
  {'Name': 'Louis', 'Age': 52, 'City': 'Noida'},
]

df = pd.DataFrame(data)
print(df)

     Name  Age      City
0  Aayush   22      Pune
1  Harvey   48  New York
2    Mike   38  Banglore
3   Louis   52     Noida


In [None]:
## creating dataframe from csv

df = pd.read_csv('Details.csv')
df.head(10) # get 1st 10 records

Unnamed: 0,Order ID,Amount,Profit,Quantity,Category,Sub-Category,PaymentMode
0,B-25681,1096,658,7,Electronics,Electronic Games,COD
1,B-26055,5729,64,14,Furniture,Chairs,EMI
2,B-25955,2927,146,8,Furniture,Bookcases,EMI
3,B-26093,2847,712,8,Electronics,Printers,Credit Card
4,B-25602,2617,1151,4,Electronics,Phones,Credit Card
5,B-25881,2244,247,4,Clothing,Trousers,Credit Card
6,B-25696,275,-275,4,Clothing,Saree,COD
7,B-25687,387,-213,5,Clothing,Saree,UPI
8,B-25643,50,-44,2,Clothing,Hankerchief,UPI
9,B-25851,135,-54,5,Clothing,Kurti,COD


In [11]:
df.tail(5) # get last 5 records

Unnamed: 0,Order ID,Amount,Profit,Quantity,Category,Sub-Category,PaymentMode
1495,B-25700,7,-3,2,Clothing,Hankerchief,COD
1496,B-25757,3151,-35,7,Clothing,Trousers,EMI
1497,B-25973,4141,1698,13,Electronics,Printers,COD
1498,B-25698,7,-2,1,Clothing,Hankerchief,COD
1499,B-25993,4363,305,5,Furniture,Tables,EMI


In [12]:
### Accessing data from dataframe

data = {
  'Name' : ['Aayush', 'Harvey', 'Mike', 'Louis'],
  'Age': [22, 48, 38, 52],
  'City': ['Pune', 'New York', 'Banglore', 'Noida']
}

df = pd.DataFrame(data)

In [13]:
df

Unnamed: 0,Name,Age,City
0,Aayush,22,Pune
1,Harvey,48,New York
2,Mike,38,Banglore
3,Louis,52,Noida


In [16]:
df['Name']

0    Aayush
1    Harvey
2      Mike
3     Louis
Name: Name, dtype: object

In [17]:
type(df['Name'])

pandas.core.series.Series

In [None]:
df.loc[0] # Accessing only w.r.t row

Name    Aayush
Age         22
City      Pune
Name: 0, dtype: object

In [21]:
df.iloc[0, 0] # Accessing specific location

'Aayush'

In [22]:
## Accessing a specific element

df['Name']

0    Aayush
1    Harvey
2      Mike
3     Louis
Name: Name, dtype: object

In [23]:
df

Unnamed: 0,Name,Age,City
0,Aayush,22,Pune
1,Harvey,48,New York
2,Mike,38,Banglore
3,Louis,52,Noida


In [25]:
print(df.at[1, 'Age'])

48


In [26]:
df.iat[2,2]

'Banglore'

In [27]:
df

Unnamed: 0,Name,Age,City
0,Aayush,22,Pune
1,Harvey,48,New York
2,Mike,38,Banglore
3,Louis,52,Noida


In [31]:
### Data manipulation with dataframes

## Adding "Salary" column to the dataframe
df['Salary'] = [90000, 200000, 150000, 190000]
df

Unnamed: 0,Name,Age,City,Salary
0,Aayush,22,Pune,90000
1,Harvey,48,New York,200000
2,Mike,38,Banglore,150000
3,Louis,52,Noida,190000


In [None]:
## Remove a  column


Unnamed: 0,Name,Age,City,Salary
0,Aayush,22,Pune,90000
1,Harvey,48,New York,200000
2,Mike,38,Banglore,150000
3,Louis,52,Noida,190000


In [None]:
df.drop('Salary', axis=1) # axis = 0 -> Rows, axis = 1 -> Columns
# not permanant operation

Unnamed: 0,Name,Age,City
0,Aayush,22,Pune
1,Harvey,48,New York
2,Mike,38,Banglore
3,Louis,52,Noida


In [35]:
df

Unnamed: 0,Name,Age,City,Salary
0,Aayush,22,Pune,90000
1,Harvey,48,New York,200000
2,Mike,38,Banglore,150000
3,Louis,52,Noida,190000


In [37]:
# Use inplace=True to make the drop action permanant
df.drop('Salary', axis=1, inplace=True)
df

Unnamed: 0,Name,Age,City
0,Aayush,22,Pune
1,Harvey,48,New York
2,Mike,38,Banglore
3,Louis,52,Noida


In [38]:
df['Age'] = df['Age'] + 1

In [39]:
df

Unnamed: 0,Name,Age,City
0,Aayush,23,Pune
1,Harvey,49,New York
2,Mike,39,Banglore
3,Louis,53,Noida


In [40]:
# Drop a  row
df.drop(3, inplace=True)
df

Unnamed: 0,Name,Age,City
0,Aayush,23,Pune
1,Harvey,49,New York
2,Mike,39,Banglore


In [43]:
df = pd.read_csv('Details.csv')

In [44]:
# Display the data-types of each column
print("Data types:\n", df.dtypes)

# describe the dataframe
df.describe()


Data types:
 Order ID        object
Amount           int64
Profit           int64
Quantity         int64
Category        object
Sub-Category    object
PaymentMode     object
dtype: object


Unnamed: 0,Amount,Profit,Quantity
count,1500.0,1500.0,1500.0
mean,291.847333,24.642,3.743333
std,461.92462,168.55881,2.184942
min,4.0,-1981.0,1.0
25%,47.75,-12.0,2.0
50%,122.0,8.0,3.0
75%,326.25,38.0,5.0
max,5729.0,1864.0,14.0
