 # What is panda
 
```python

* Pandas is an open source, high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas adds data structures and tools designed to work with table-like data which is Series and Data Frames. Pandas provides tools for data manipulation:

*    reshaping
*    merging
*    sorting
*    slicing
*    aggregation
*    imputation. If you are using anaconda, you do not have install pandas.


```

# Install panda 

```sh

## for linux & mac user

# 1. install conda

pip install conda

# 2. install panda

conda install pandas



## for windows user

# 1. install conda

pip install conda

# 1. install panda

pip install pandas

```

# Import pandas

In [3]:
import pandas as pd # importing pandas as pd
import numpy  as np # importing numpy as np

### Creating Pandas Series with Default Index

In [63]:
'''
## Series()   funciton will return the index and the data type of the list or tuple but not set
'''

nums = [1, 2, 3, 4,5]
# nums = (1, 2, 3, 4,5)
# nums = {1, 2, 3, 4,5} # 'set' type is unordered
s = pd.Series(nums)

print(s)

0    1
1    2
2    3
3    4
4    5
dtype: int64


### Creating Pandas Series with custom index

In [66]:
nums = [1, 2, 3, 4, 5]
s = pd.Series(nums, index=[1, 2, 3, 4, 5]) # manually change the index of list or tuple that use by   Series()   function using ( index ) parameter 

print(s)

1    1
2    2
3    3
4    4
5    5
dtype: int64


In [8]:
fruits = ['Orange','Banana','Mango']
fruits = pd.Series(fruits, index=[1, 2, 3])

print(fruits)

1    Orange
2    Banana
3     Mango
dtype: object


### Creating Pandas Series from a Dictionary

In [68]:
dct = {'name':'Asabeneh','country':'Finland','city':'Helsinki'} # dictionary   or   object
s = pd.Series(dct) # Series()   function will return the key of the value instaid of index if you are use dictionary

print(s)

name       Asabeneh
country     Finland
city       Helsinki
dtype: object


### Creating a Constant Pandas Series

In [69]:
s = pd.Series(10, index = [1, 2, 3]) # taking index value for value of value of 10
'''
1    10
2    10
3    10
'''
print(s)

1    10
2    10
3    10
dtype: int64


### Creating a Pandas Series Using Linspace

```python

# Syntax

panda.Series(numpy.linspace(starting, end, number_of_value))

```

In [70]:
s = pd.Series(np.linspace(5, 20, 10)) # linspace(starting, end, number_of_value)

print(s)

0     5.000000
1     6.666667
2     8.333333
3    10.000000
4    11.666667
5    13.333333
6    15.000000
7    16.666667
8    18.333333
9    20.000000
dtype: float64


# DataFrames

### Creating DataFrames from List of Lists

```python

# Syntax

panda.DataFrame(data..., columns=[column1, column2, column3, ...])


## DataFrame()   function will return the column name if you specify and the index of the value


```

In [71]:
data = [
    ['Asabeneh', 'Finland', 'Helsink'], 
    ['David', 'UK', 'London'],
    ['John', 'Sweden', 'Stockholm']
]

df = pd.DataFrame(data, columns=['Names','Country','City']) 

print(df)

      Names  Country       City
0  Asabeneh  Finland    Helsink
1     David       UK     London
2      John   Sweden  Stockholm


### Creating a Constant Pandas Series

In [72]:
s = pd.Series(10, index = [1, 2, 3]) # taking index to the value of 10
'''
1    10
2    10
3    10
'''

print(s)

1    10
2    10
3    10
dtype: int64


### Creating a Pandas Series Using Linspace

In [15]:
s = pd.Series(np.linspace(5, 20, 10)) # linspace(starting, end, number_of_value)

print(s)

0     5.000000
1     6.666667
2     8.333333
3    10.000000
4    11.666667
5    13.333333
6    15.000000
7    16.666667
8    18.333333
9    20.000000
dtype: float64


# DataFrames

### Creating DataFrames from List of Lists 

In [73]:
data = [
    ['Asabeneh', 'Finland', 'Helsink'], 
    ['David', 'UK', 'London'],
    ['John', 'Sweden', 'Stockholm']
]

df = pd.DataFrame(data, columns=['Names','Country','City'])

print(df)

      Names  Country       City
0  Asabeneh  Finland    Helsink
1     David       UK     London
2      John   Sweden  Stockholm


### Creating DataFrame Using Dictionary

In [80]:
data = {
        'Name': ['Asabeneh', 'David', 'John'], 
        'Country':['Finland', 'UK', 'Sweden'], 
        'City': ['Helsiki', 'London', 'Stockholm']
}

df = pd.DataFrame(data) # for dictionary you don't need to specify the columns name because the key is the column name for the value

print(df)

       Name  Country       City
0  Asabeneh  Finland    Helsiki
1     David       UK     London
2      John   Sweden  Stockholm


In [85]:
data = [
    {'Name': 'Asabeneh', 'Country': 'Finland', 'City': 'Helsinki'},
    {'Name': 'David', 'Country': 'UK', 'City': 'London'},
    {'Name': 'John', 'Country': 'Sweden', 'City': 'Stockholm'}
]

df = pd.DataFrame(data) # if you have list that has dictionary inside it then you don't need to specify the column name for the value because the key will be the column name for there value

print(df)

print(type(data)) # list

       Name  Country       City
0  Asabeneh  Finland   Helsinki
1     David       UK     London
2      John   Sweden  Stockholm
<class 'list'>


# Reading CSV File Using Pandas

```sh

# To download the CSV file, what is needed in this example, console/command line is enough:

curl -O https://raw.githubusercontent.com/Asabeneh/30-Days-Of-Python/master/data/weight-height.csv

```

```python

## reading csv file using read_csv() function with pandas module

pandas.read_csv('file_name.csv')

```

In [86]:
import pandas as pd

df = pd.read_csv('weight-height.csv')

print(df)

      Gender     Height      Weight
0       Male  73.847017  241.893563
1       Male  68.781904  162.310473
2       Male  74.110105  212.740856
3       Male  71.730978  220.042470
4       Male  69.881796  206.349801
...      ...        ...         ...
9995  Female  66.172652  136.777454
9996  Female  67.067155  170.867906
9997  Female  63.867992  128.475319
9998  Female  69.034243  163.852461
9999  Female  61.944246  113.649103

[10000 rows x 3 columns]


### Data Exploration

In [89]:
print(df.head()) # by default head() function will back 5 row in top, just type number of row if you want go back to you in the file from top

  Gender     Height      Weight
0   Male  73.847017  241.893563
1   Male  68.781904  162.310473
2   Male  74.110105  212.740856
3   Male  71.730978  220.042470
4   Male  69.881796  206.349801


In [93]:
print(df.head(10)) # calling 10 row in top of the csv file

  Gender     Height      Weight
0   Male  73.847017  241.893563
1   Male  68.781904  162.310473
2   Male  74.110105  212.740856
3   Male  71.730978  220.042470
4   Male  69.881796  206.349801
5   Male  67.253016  152.212156
6   Male  68.785081  183.927889
7   Male  68.348516  167.971110
8   Male  67.018950  175.929440
9   Male  63.456494  156.399676


In [92]:
print(df.tail()) # by default tail() function will back 5 row in bottom, just type number of row if you want go back to you in the file from bottom

      Gender     Height      Weight
9995  Female  66.172652  136.777454
9996  Female  67.067155  170.867906
9997  Female  63.867992  128.475319
9998  Female  69.034243  163.852461
9999  Female  61.944246  113.649103


In [None]:
print(df.tail(10)) # calling 10 row in bottom of the csv file 

In [95]:
print(df.shape) # get the shape of the csv using ( shape ) keyword # 10000 rows and 3 columns

(10000, 3)


In [96]:
print(df.columns) # get the name of columns and data type of the csv file using ( columns ) keyword

Index(['Gender', 'Height', 'Weight'], dtype='object')


In [97]:
'''
# Syntax

variable_name = pandas.read_csv('weight-height.csv')

variable_name['column_name']


## get all the data of 'Height' column, just by specifying columns name inside this bracket []
## we get the (data of this colmuns) and the (index number) also the (length) and (data type) of this columns
'''

heights = df['Height'] 

print(heights)

0       73.847017
1       68.781904
2       74.110105
3       71.730978
4       69.881796
          ...    
9995    66.172652
9996    67.067155
9997    63.867992
9998    69.034243
9999    61.944246
Name: Height, Length: 10000, dtype: float64


In [99]:
weights = df['Weight'] # get the value of this column with index number and length and data type by specifying the column name inside this bracket [] 

print(weights)

0       241.893563
1       162.310473
2       212.740856
3       220.042470
4       206.349801
           ...    
9995    136.777454
9996    170.867906
9997    128.475319
9998    163.852461
9999    113.649103
Name: Weight, Length: 10000, dtype: float64


In [100]:
print(len(heights) == len(weights)) # compare length of this two column with each other if match (True) if not (False)

True


In [102]:
'''
## information that   describe()   function is return for specific columns is

count  # number of row
mean   # mean of all row
std    # standard divition of all row
min    # minimum number of all row
%      # percentage of this row that has the same
max    # maximum number of all row
Name: ... # name of columns
dtype  # data type of all row

'''

print(heights.describe()) # give statisical information about height data

count    10000.000000
mean        66.367560
std          3.847528
min         54.263133
25%         63.505620
50%         66.318070
75%         69.174262
max         78.998742
Name: Height, dtype: float64


In [103]:
print(weights.describe()) # give statisical information about weight data

count    10000.000000
mean       161.440357
std         32.108439
min         64.700127
25%        135.818051
50%        161.212928
75%        187.169525
max        269.989699
Name: Weight, dtype: float64


In [104]:
print(df.describe())  # describe can also give statistical information from a dataFrame means all the columns data

             Height        Weight
count  10000.000000  10000.000000
mean      66.367560    161.440357
std        3.847528     32.108439
min       54.263133     64.700127
25%       63.505620    135.818051
50%       66.318070    161.212928
75%       69.174262    187.169525
max       78.998742    269.989699


# Modifying a DataFrame

### Creating a DataFrame

In [106]:
import pandas as pd
import numpy as np

data = [
    {"Name": "Asabeneh", "Country":"Finland","City":"Helsinki"},
    {"Name": "David", "Country":"UK","City":"London"},
    {"Name": "John", "Country":"Sweden","City":"Stockholm"}
]

df = pd.DataFrame(data) # you don't need specifying the columns name for this list that have dictionary inside it because the key of the value in dictionary is column name for there value

print(df)

       Name  Country       City
0  Asabeneh  Finland   Helsinki
1     David       UK     London
2      John   Sweden  Stockholm


### Adding a New Column

```python

data_variable = ....

DataFrame_variable['column_name'] = data_variable

## Note: this column with this data is add in the end of this column DataFrame had

```

In [111]:
weights = [74, 78, 69]

df['Weight'] = weights
    
df

Unnamed: 0,Name,Country,City,Weight,Height
0,Asabeneh,Finland,Helsinki,74,173
1,David,UK,London,78,175
2,John,Sweden,Stockholm,69,169


In [112]:
heights = [173, 175, 169]

df['Height'] = heights

print(df)

       Name  Country       City  Weight  Height
0  Asabeneh  Finland   Helsinki      74     173
1     David       UK     London      78     175
2      John   Sweden  Stockholm      69     169


### Modifying column values

```python

DataFrame['column_name'] = DataFrame['column_name'] + value - value / value * value ** value % value // value ... # anything you like
...
...

```

In [114]:
df['Height'] = df['Height'] * 0.01 # 

df

Unnamed: 0,Name,Country,City,Weight,Height
0,Asabeneh,Finland,Helsinki,74,0.0173
1,David,UK,London,78,0.0175
2,John,Sweden,Stockholm,69,0.0169


In [117]:
# Using functions makes our code clean, but you can calculate the bmi without one
def calculate_bmi ():
    # define the column to there own variable
    weights = df['Weight']
    heights = df['Height']
    # empty list in outside for loop because we put the value to it
    bmi = []
    # for loop that we used for looping throw all the value
    # we use   zip()   function because if we have column that has less row or more row to other column and we don't get error for this
    for w,h in zip(weights, heights):
        # math for calculate the bmi between ( heights ) and ( weights )
        b = w/(h*h)
        # put value to bmi list using   append()  function
        bmi.append(b)
    # in the end return one by one this value to the bmi list 
    # we use list for bmi variable in outside for loop because we want store all the value in bmi list variable before get it by function
    return bmi

# calling the function
bmi = calculate_bmi()

In [120]:
# putting new column for the DataFrame and this column is handle the data of (bmi) variable
df['BMI'] = bmi
# if you call the DataFrame like this you get beutiful table
# but if you call the DataFrame with   print()   funciton the table is not clean like table DataFrame
df

Unnamed: 0,Name,Country,City,Weight,Height,BMI
0,Asabeneh,Finland,Helsinki,74,0.0173,247251.829329
1,David,UK,London,78,0.0175,254693.877551
2,John,Sweden,Stockholm,69,0.0169,241588.179686


### Formating DataFrame columns

In [122]:
'''
round(Data_Frame['column_name'], number_of_point)

'''

df['BMI'] = round(df['BMI'], 1) # we want one number after dot (.) in column   'BMI'
df

Unnamed: 0,Name,Country,City,Weight,Height,BMI
0,Asabeneh,Finland,Helsinki,74,0.0173,247251.8
1,David,UK,London,78,0.0175,254693.9
2,John,Sweden,Stockholm,69,0.0169,241588.2


In [138]:
birth_year = ['1769', '1985', '1990'] # list of value

current_year = pd.Series(2020, index=[0, 1, 2]) # putting index to value of 2020

df['Birth_Year'] = birth_year     # create new column that handle data of (Birth_Year)
df['Current_Year'] = current_year # create new column that handle data of (Current_Year)
df

Unnamed: 0,Name,Country,City,Weight,Height,BMI,Birth_Year,Current_Year
0,Asabeneh,Finland,Helsinki,74,0.0173,247251.8,1769,2020
1,David,UK,London,78,0.0175,254693.9,1985,2020
2,John,Sweden,Stockholm,69,0.0169,241588.2,1990,2020


### Checking data types of Column values

```python

## using dtype keyword for checking the data type column or value or ...

DataFrame['column_name'].dtype


# variable_name in this case is handle the (column name) and the (data) of this column

DataFrame.variable_name.dtype


```

In [130]:
print(df.Weight.dtype) # int64

df['Weight'].dtype # dtype('int64')

int64


dtype('int64')

In [139]:
## Note: if the name of column has more than one word and the word has space between them you may get error
# for solving that just make sure the column is contain one word or make it one word by using underscore like this is below

df['Birth_Year'].dtype # it gives string object, we should change this to number 
## dtype('O') # means   string   data type

dtype('O')

In [142]:
'''
## change data type of column

DataFrame['column_name'] = DataFrame['column_name'].astype('data_type')

'''

df['Birth_Year'] = df['Birth_Year'].astype('int') # if you just say (int) then (int64) with 64 bit will return
# df['Birth_Year'] = df['Birth_Year'].astype('int32') # spcifying the number of bit

print(df['Birth_Year'].dtype) # let's check the data type now

int64


In [144]:
df['Current_Year'] = df['Current_Year'].astype('int')

df['Current_Year'].dtype

dtype('int64')

In [145]:
ages = df['Current_Year'] - df['Birth_Year'] # subract two column with each other

ages

0    251
1     35
2     30
dtype: int64

In [146]:
# putting this data that we get from this two column that we subtract with each other to new column  'Ages'
df['Ages'] = ages

print(df)

       Name  Country       City  Weight  Height       BMI  Birth_Year  \
0  Asabeneh  Finland   Helsinki      74  0.0173  247251.8        1769   
1     David       UK     London      78  0.0175  254693.9        1985   
2      John   Sweden  Stockholm      69  0.0169  241588.2        1990   

   Current_Year  Ages  
0          2020   251  
1          2020    35  
2          2020    30  


In [147]:
mean = (35 + 30) / 2

print('Mean: ',mean)    # it is good to add some description to the output

Mean:  32.5


### Boolean Indexing

```python

## get this data that match your condition

DataFrame[DataFrame['column_name'] > value]
DataFrame[DataFrame['column_name'] < value]
DataFrame[DataFrame['column_name'] = value]
DataFrame[DataFrame['column_name'] == value]
...

DataFrame[DataFrame['column_name'] > value or DataFrame['column_name'] = value]]
DataFrame[DataFrame['column_name'] >= value and DataFrame[DataFrame['column_name'] == value]]
...
...

```

In [55]:
print(df[df['Ages'] > 120])

       Name  Country      City  Weight  Height   BMI  Birth Year  \
0  Asabeneh  Finland  Helsinki      74    1.73  24.7        1769   

   Current Year  Ages  
0          2020   251  


In [56]:
print(df[df['Ages'] < 120])

    Name Country       City  Weight  Height   BMI  Birth Year  Current Year  \
1  David      UK     London      78    1.75  25.5        1985          2020   
2   John  Sweden  Stockholm      69    1.69  24.2        1990          2020   

   Ages  
1    35  
2    30  
