# Tabular Dataset Example
![image.png](image_2.png)

# Datasets in Python
- 2D Numpy Arrays?
    - its not the right choice because arrays only allow 1 data type, whereas what is needed requires various data types.

- Pandas
    - pandas is a high level manipulation tool, it created by Wes McKinney in 2008. It is built on top of numpy. compare to numpy, pandas is more user friendly.
    - in pandas we store the tabular data in a dataframe.

In [2]:
import pandas as pd

brics = pd.read_csv('https://assets.datacamp.com/production/repositories/287/datasets/b60fb5bdbeb4e4ab0545c485d351e6ff5428a155/brics.csv', index_col=0)
print(brics)

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98


# Dataframe from a dictionary

In [3]:
dict = {
    "country": ["Brazil", "Russia", "India", "China", "South Africa"],
    "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
    "area": [8.516, 17.10, 3.286, 9.597, 1.221],    
    "population": [200.4, 143.50, 1252.00, 1357.00, 52.98],
}


- keys (column labels)
- values (data, column by column)

In [4]:
import pandas as pd
brics_dict = pd.DataFrame(dict)

print(brics_dict)

        country    capital    area  population
0        Brazil   Brasilia   8.516      200.40
1        Russia     Moscow  17.100      143.50
2         India  New Dehli   3.286     1252.00
3         China    Beijing   9.597     1357.00
4  South Africa   Pretoria   1.221       52.98


In [5]:
# can set index atribute with the correct labels
brics_dict.index = ['BR', 'RU', 'IN', 'CH', 'SA']
print(brics_dict)

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Dehli   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98


# DataFrame from CSV File

In [6]:
import pandas as pd

brics = pd.read_csv('https://assets.datacamp.com/production/repositories/287/datasets/b60fb5bdbeb4e4ab0545c485d351e6ff5428a155/brics.csv', index_col=0)
print(brics)

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98


# Index and selecting data
- Square brackets []
- Advanced method
    - .loc[]
    - .iloc[]

# Column Access []

In [7]:
print(brics["country"])

BR          Brazil
RU          Russia
IN           India
CH           China
SA    South Africa
Name: country, dtype: object


In [None]:
print(type(brics["country"])) # 1D labelled array

<class 'pandas.core.series.Series'>


In [10]:
print(brics[["country"]])

         country
BR        Brazil
RU        Russia
IN         India
CH         China
SA  South Africa


In [11]:
print(type(brics[["country"]]))

<class 'pandas.core.frame.DataFrame'>


In [12]:
print(brics[["country", "capital"]])

         country    capital
BR        Brazil   Brasilia
RU        Russia     Moscow
IN         India  New Delhi
CH         China    Beijing
SA  South Africa   Pretoria


# Row Accesss []

In [13]:
print(brics[1:4])

   country    capital    area  population
RU  Russia     Moscow  17.100       143.5
IN   India  New Delhi   3.286      1252.0
CH   China    Beijing   9.597      1357.0


# Discussion []
- Square brackets: limited functionality
- Ideally
    - 2D NumPy arrays
    - my_array[row, column]
- pandas
    - loc (label-based)
    - iloc (integer position-based)
    

# Row Access loc

In [None]:
print(brics.loc["RU"]) # Row as pandas Series

country       Russia
capital       Moscow
area            17.1
population     143.5
Name: RU, dtype: object


In [15]:
print(brics.loc[["RU"]]) # Dataframe

   country capital  area  population
RU  Russia  Moscow  17.1       143.5


In [16]:
print(brics.loc[['RU','IN','CH']])

   country    capital    area  population
RU  Russia     Moscow  17.100       143.5
IN   India  New Delhi   3.286      1252.0
CH   China    Beijing   9.597      1357.0


# Row & Column loc

In [17]:
print(brics.loc[["BR", "RU", "IN"], ["country", "capital"]])

   country    capital
BR  Brazil   Brasilia
RU  Russia     Moscow
IN   India  New Delhi


In [18]:
print(brics.loc[:, ['country', 'capital']])

         country    capital
BR        Brazil   Brasilia
RU        Russia     Moscow
IN         India  New Delhi
CH         China    Beijing
SA  South Africa   Pretoria


# Recap
- Square brackets
    - Column access: brics[["country", ""capital"]]
    - Row access: only through slicing, brics[0:2]
- loc (label-based)
    - Row access: brics.loc[["RU", "IN", "CH"]]
    - Column access: brics.loc[:, ["country", "capital"]]
    - Row and column access: brics.loc[["RU", "IN", "CH"], ["country", "capital"]]

# Row Access iloc


In [19]:
print(brics.iloc[[1]])

   country capital  area  population
RU  Russia  Moscow  17.1       143.5


In [20]:
print(brics.iloc[[1, 2, 3]])

   country    capital    area  population
RU  Russia     Moscow  17.100       143.5
IN   India  New Delhi   3.286      1252.0
CH   China    Beijing   9.597      1357.0


In [21]:
print(brics.iloc[[1, 2, 3], [0, 1]])

   country    capital
RU  Russia     Moscow
IN   India  New Delhi
CH   China    Beijing


In [None]:
print(brics.iloc[:, [0, 1]])

         country    capital
BR        Brazil   Brasilia
RU        Russia     Moscow
IN         India  New Delhi
CH         China    Beijing
SA  South Africa   Pretoria


: 