# PANDAS :
-----------
2 main datastructures:
- Series
- DataFrames

In [1]:
import pandas as pd
import numpy as np

<center><h1>Pandas Series():</h1></center>
<hr>

Can be created using:
- Scalar values: values in list, tuple. Index if not passed, starts from 0, else the indexes are the ones in the sequence passed as 2nd parameter.
- Python dictionary. The keys becomes index, and values becomes values.
- Numpy ndarray

## Created from Scalar Values

### 1. Creating Series without index:

In [2]:
cities = ["Kolkata", "Durgapur", "Chittaranjan", "Jalpaiguri"]

a = pd.Series(cities)
print(a)

0         Kolkata
1        Durgapur
2    Chittaranjan
3      Jalpaiguri
dtype: object


### 2. Creating Series with index:

In [3]:
cities = ["Kolkata", "Durgapur", "Chittaranjan", "Jalpaiguri"]
index = ["k", "d", "c", "j"]

b = pd.Series( cities, index )
print(b)

k         Kolkata
d        Durgapur
c    Chittaranjan
j      Jalpaiguri
dtype: object


- Same index can get repeated multiple times (inlike Python dictionary)

In [4]:
cities = ["Kolkata", "Durgapur", "Chittaranjan", "Jalpaiguri", "Kota"]
index = ["k", "d", "c", "j", "k"]

b = pd.Series( cities, index )
print(b)

k         Kolkata
d        Durgapur
c    Chittaranjan
j      Jalpaiguri
k            Kota
dtype: object


In [5]:
a = pd.Series( (10,20,30,40) )
print( a )

0    10
1    20
2    30
3    40
dtype: int64


In [6]:
a = pd.Series( (10,20,30.5,"Hello",50) )
print( a )

0       10
1       20
2     30.5
3    Hello
4       50
dtype: object


## Created from Python Dictionary:

In [8]:
data_dict = {21: "Debanjan", 11: "Atul", 24: "Anol"}

s = pd.Series( data_dict )
print(s)

21    Debanjan
11        Atul
24        Anol
dtype: object


## Created from Numpy Array

In [10]:
a1 = np.array([
    [1,2,3],
    [4,5,6]
])

a2 = np.array([
    [11,22,33],
    [44,55,66]
])

data = a1,a2

s = pd.Series(data)
print(s)

0          [[1, 2, 3], [4, 5, 6]]
1    [[11, 22, 33], [44, 55, 66]]
dtype: object


- Here, a tuple is considered as a scalar value, thus a object.
- But, if we pass an individual 2D array, it will not make it 2 1D array with index 0 and 1. Rather, will throw an error

In [11]:
a1 = np.array([
    [1,2,3],
    [4,5,6]
])

s = pd.Series(a1)
print(s)

ValueError: Data must be 1-dimensional, got ndarray of shape (2, 3) instead

<hr><hr><hr><hr><hr>

<center><h1>Pandas DataFrame():</h1></center>
<hr>
<ul>
    <li>Pandas Dataframe prioritises Columns before rows, unlike Pandas Series and Numpy array, both of which prioritises row over column.</li>
    <li>This means, when accessing any value using dataframe, the left side index will be of Column, and right side index will be of row.</li>
    <li>Ex:- <b>df[column][row]</b></li>
</ul>

DataFrame stores relational 2D data, in the form of rows and columns, i.e., in tabular format.

## Creating DataFrame from Python Dictionary:

In [3]:
data_dict = {
    21 : "Debanjan",
    10 : "Atul",
    16 : "Sagnik"
}

df = pd.DataFrame(data_dict)
print(df)

ValueError: If using all scalar values, you must pass an index

If we pass multiple dictionaries, nested in a dictionary, and use that as data of DataFrame, without passing any index, then:
- The key's of the top level dictionary becomes the column headings/names.
- The values of the top-level dictionaries are inner / low-level dictionaries. The values of the low-level dictionaries, becomes the column values.
- The mapping between column values of one inner-dictionary, and other inner dictionary is done on the basis of matching values of keys of inner dictionaries.

### Example 1:

In [6]:
n1 = {
    21 : "Debanjan",
    10 : "Atul",
    16 : "Sagnik",
    18 : "Anjan"
}

n2 = {
    21 : "Projna",
    10 : "Rageshree",
    16 : "Antara"
}

data = { "male" : n1, "female" : n2 }
df = pd.DataFrame( data )
print(df)

        male     female
21  Debanjan     Projna
10      Atul  Rageshree
16    Sagnik     Antara
18     Anjan        NaN


### Example 2:

In [11]:
father_dict = {
    "debanjan" : "dipankar",
    "atul" : "dinesh",
    "rakesh" : "alok",
    "arpan" : "subhas",
    "arnab" : "prabir"
}

mother_dict = {
    "debanjan" : "mausumi",
    "atul" : "bina",
    "arpan" : "pamela",
    "arnab" : "roma",
    "rakesh" : "bhromor",
}

data = { "father": father_dict, "mother": mother_dict }
df = pd.DataFrame(data)

print(df)

            father   mother
debanjan  dipankar  mausumi
atul        dinesh     bina
rakesh        alok  bhromor
arpan       subhas   pamela
arnab       prabir     roma


## Creating DataFrame from Numpy ndarray:

In [13]:
arr = np.array([
    [1,2,3],
    [4,5,6]
])

df = pd.DataFrame(arr)
print(df)

   0  1  2
0  1  2  3
1  4  5  6


In [18]:
arr = np.array([
    [1,2,3],
    [4,5,6],
])

df = pd.DataFrame(arr, index=["a", "b"])
print(df)

   0  1  2
a  1  2  3
b  4  5  6


In [22]:
df[2]['a']

3

In [24]:
arr1 = np.array([1,2,3,4,5])
arr10 = np.array([10,20,30,40,50])
arr100 = np.array([100,200,300,400,500])

data = {"1s": arr1, "10s": arr10, "100s": arr100}
df = pd.DataFrame(data)

print(df)

   1s  10s  100s
0   1   10   100
1   2   20   200
2   3   30   300
3   4   40   400
4   5   50   500


## Creating DataFrame from Pandas Series:

In [25]:
s1 = pd.Series([1,2,3,4,5])
s10 = pd.Series([10,20,30,40,50])
s100 = pd.Series([100,200,300,400,500])

data = { "s1": s1, "s10": s10, "s100": s100 }
df = pd.DataFrame(data)

print(df)

   s1  s10  s100
0   1   10   100
1   2   20   200
2   3   30   300
3   4   40   400
4   5   50   500


## Creating DataFrame from a CSV file: 
## `pandas.read_csv( <file_path>, header= None/int(list of integers)/... )`

In [28]:
df_student = pd.read_csv("./data/02_student.csv")

print(df_student)

    id         name  class  mark  gender
0    1     John Deo   Four    75  female
1    2     Max Ruin  Three    85    male
2    3       Arnold  Three    55    male
3    4   Krish Star   Four    60  female
4    5    John Mike   Four    60  female
5    6    Alex John   Four    55    male
6    7  My John Rob  Fifth    78    male
7    8       Asruid   Five    85    male
8    9      Tes Qry    Six    78    male
9   10     Big John   Four    55  female
10  11       Ronald    Six    89  female
11  12        Recky    Six    94  female
12  13          Kty  Seven    88  female
13  14         Bigy  Seven    88  female
14  15     Tade Row   Four    88    male
15  16        Gimmy   Four    88    male
16  17        Tumyu    Six    54    male
17  18        Honny   Five    75    male
18  19        Tinny   Nine    18    male
19  20       Jackly   Nine    65  female
20  21   Babby John   Four    69  female
21  22       Reggid  Seven    55  female
22  23        Herod  Eight    79    male
23  24    Tiddy 