<a href="https://colab.research.google.com/github/dborges14/SamsungInovationCampus_AI/blob/main/Chapter3_Pandas_I.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What is **Pandas** ?

- An open source Python library that that provides data structures and data analysis tools.
- Provides data wrangling, processing and modeling capabilities for Python comparable to R language.
- Highly optimized for performance when dealing with data.


### Structured vs Unstructured data

- Unstructured data: such as raw data obtained by web-scraping, log files, etc.
- Structured data: such as data contained in CSV files, Excel spreadsheets, SQL tables, etc

## Pandas **Series**
- Similar to one dimensional NumPy array.
- Has attributes such as index, name, etc.
- Supports vectorized operations.

In [5]:
import pandas as pd
import numpy as np

In [70]:
data = {"Brown":[220], "Blue":[215], "Hazel":[93], "Green":[64]}
df = pd.DataFrame(data)
type(df)					     # 'df' is a DataFrame object. 

pandas.core.frame.DataFrame

In [73]:
type(df['Brown'])					     # A column of 'df' is a Series object.

pandas.core.series.Series

In [18]:
my_data = np.array([220, 215, 93,64])
eye = pd.Series(data=my_data, index=['Brown','Blue','Hazel','Green'], name='Eye color count')    # Create a new Series object.
eye

Brown    220
Blue     215
Hazel     93
Green     64
Name: Eye color count, dtype: int64

**Series** attributes and methods:

In [19]:
ser = eye

In [20]:
print(ser.name)			# Name attribute of the Series object 'ser'.

Eye color count


In [21]:
ser.index			# Index attribute of the Series object 'ser'.

Index(['Brown', 'Blue', 'Hazel', 'Green'], dtype='object')

In [22]:
ser.values			# Values of the Series given as a NumPy array.

array([220, 215,  93,  64])

In [23]:
ser.sort_values()			# Sort the Series values from the smallest to the largest.

Green     64
Hazel     93
Blue     215
Brown    220
Name: Eye color count, dtype: int64

In [24]:
ser.sort_index()			# Sort the Series indices from the smallest to the largest. (or alphabetic order)

Blue     215
Brown    220
Green     64
Hazel     93
Name: Eye color count, dtype: int64

In [25]:
ser.nunique()			# Number of unique values in the Series.

4

In [26]:
ser.unique()			# Unique values of the Series given as a NumPy array

array([220, 215,  93,  64])

In [83]:
rser = pd.Series(np.random.randint(1,10,50))    # 50 random integers between 1 and 9
rser.value_counts()			# Returns a frequency table.

5    9
9    7
8    7
7    7
6    6
2    5
1    5
4    2
3    2
dtype: int64

In [74]:
ser = pd.Series([0,10,20,30,40], index = ['a','b','c','d','e'])	# Create a Series object.

In [75]:
ser[1]     # Element at the position 1.

10

In [77]:
ser['b']     # Element at the position 'b’.

10

In [78]:
ser[1:3]     # Get the elements at the positions 1 and 2.

b    10
c    20
dtype: int64

In [93]:
ser1 = pd.Series([0,1,2,3,4], index = [0,1,2,3,4]) 	# Create a Series.
ser2 = pd.Series([0,1,2,3,4], index = [4,3,2,1,0]) 	# Create a Series.
#ser1 + ser2                              			# Element-wise addition matched by the index.

0    1
1    4
2    4
3    4
4    8
dtype: int64

In [89]:
ser3 = ser1 * ser2             # Element-wise multiplication matched by the index.
#ser3

0    0
1    3
2    4
3    3
4    0
dtype: int64

In [95]:
ser4 = ser1/2                         # Divide a Series by a scalar value.
#ser4

0    0.0
1    0.5
2    1.0
3    1.5
4    2.0
dtype: float64

In [98]:
ser_height = pd.Series([165.3, 170.1, 175.0, 182.1, 168.0, 162.0, 155.2, 176.9, 178.5, 176.1, 167.1, 180.0, 162.2, 176.1, 158.2, 168.6, 169.2])
print(ser_height)
#ser_height.apply(lambda x: x/100)            # Apply element-wise a lambda function.

0     165.3
1     170.1
2     175.0
3     182.1
4     168.0
5     162.0
6     155.2
7     176.9
8     178.5
9     176.1
10    167.1
11    180.0
12    162.2
13    176.1
14    158.2
15    168.6
16    169.2
dtype: float64


Time for exercise **'ex_0204.ipynb'**