# Pandas Tutorial

## what is pandas?

```
source : https://pandas.pydata.org/docs/getting_started/overview.html

Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis/manipulation tool available in any language. It is already well on its way toward this goal.

```

# 1. loading modules

In [1]:
from IPython.display import Image
import pandas as pd #pd is an alias for pandas
import numpy as np #np is an alias for numpy
import matplotlib.pyplot as plt #plt is an alias for pyplot

#alias : The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation.

!pip install pandas

```
source : https://pandas.pydata.org/docs/getting_started/overview.html

two primary data structures of pandas
    1. Series : 1D labeled array
    2. DataFrame : general 2D labeled structure
```

![image](./images/SeriesandDataframe.png)

source : https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/

# 2. creating a Series

## 2-1. using list

In [2]:
sr1 = pd.Series([1, 2, 3, 4])
sr1

0    1
1    2
2    3
3    4
dtype: int64

**values of series**

In [3]:
sr1.values

array([1, 2, 3, 4])

**indices of series**

In [4]:
sr1.index

RangeIndex(start=0, stop=4, step=1)

**setting indices**

In [7]:
sr2 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
sr2

a    1
b    2
c    3
d    4
dtype: int64

In [8]:
sr2.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [12]:
values = [100, 200, 300, 400, 500]
index = ['apple', 'banana', 'watermelon', 'grapes', 'orange']
sr3 = pd.Series(values, index=index)
sr3

apple         100
banana        200
watermelon    300
grapes        400
orange        500
dtype: int64

**changing indices**

In [13]:
sr3.index = ['Seoul', 'Pusan', 'Daegu', 'Daejeon', 'Incheon']
sr3

Seoul      100
Pusan      200
Daegu      300
Daejeon    400
Incheon    500
dtype: int64

## 2-2. using dictionary

In [19]:
sdata = {'Kim' : 100, 'Park' : 200, 'Lee' : 300, 'Choi' : 400}
sr4 = pd.Series(sata)
sr4

Kim     100
Park    200
Lee     300
Choi    400
dtype: int64

# 3. access data from Series
https://www.tutorialspoint.com/python_pandas/python_pandas_series.htm

## 3-1. single data

In [35]:
sr2 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) #series from above
sr2

a    1
b    2
c    3
d    4
dtype: int64

In [37]:
sr2[0]

1

In [38]:
sr2[3]

4

In [39]:
sr2[6]

IndexError: index 6 is out of bounds for axis 0 with size 4

In [40]:
sr2[-1]

4

## 3-2. a range of data

In [41]:
sr2[:2]

a    1
b    2
dtype: int64

In [42]:
sr2[:3]

a    1
b    2
c    3
dtype: int64

In [43]:
sr2[:]

a    1
b    2
c    3
d    4
dtype: int64

# 2-3. Quiz

1. Return the values of sr4.
2. Return the indices of sr4.
3. Change the names of indices into green, yellow, blue, red and print sr4
4. Create the same series as below using dictionary. Name it as sr5 and print the result.

![image](./images/quiz1-4.png)

sdata2 = {1 : 'Thomas', 2 : 'Jane', 3 : 'Edward', 4 : 'Jessica', 5 : 'Irene'}
sr5 = pd.Series(sdata2)
sr5

# 3. creating DataFrames

you can construct a DataFrame using
1. list
2. dictionary
3. series
4. numpy ndarray
5. DataFrame

source : https://wikidocs.net/32829

## 3-1. using list

In [30]:
ddata = [
        [1000, 'dog', 4], 
        [1001, 'cow', 4], 
        [1002, 'bird', 2], 
        [1003, 'ant', 6], 
        [1004, 'spider', 8],
        [1005, 'cat', 4]
        ]
df1 = pd.DataFrame(ddata)
print(df1)

      0       1  2
0  1000     dog  4
1  1001     cow  4
2  1002    bird  2
3  1003     ant  6
4  1004  spider  8
5  1005     cat  4


**setting column name**

In [32]:
df1 = pd.DataFrame(ddata, columns=['Number', 'Species', 'Number of legs'])
print(df1)

   Number Species  Number of legs
0    1000     dog               4
1    1001     cow               4
2    1002    bird               2
3    1003     ant               6
4    1004  spider               8
5    1005     cat               4


## 3-2. using dictionary

In [33]:
ddata2 = {'Number' : [1000, 1001, 1002, 1003, 1004, 1005],
          'Species' : ['dog', 'cow', 'bird', 'ant', 'spider', 'cat'],
          'Number of legs' : [4, 4, 2, 6, 8, 4]}
df2 = pd.DataFrame(ddata2)
print(df2)

   Number Species  Number of legs
0    1000     dog               4
1    1001     cow               4
2    1002    bird               2
3    1003     ant               6
4    1004  spider               8
5    1005     cat               4
