### Intro to Pandas DataFrames

In [None]:
### Introduction to DataFrames

import pandas as pd

Firstly, take a look at the dataset:

![tech_table.png](attachment:tech_table.png)

# Creating an empty DataFrame
df = pd.DataFrame()

print(df)

Now, let's add our data:

# Lists of data
data = {'Revenue': [274515,200734,182527,181945,143015,129184,92224,85965,84893,
                    82345,77867,73620,69864,63191],
        'Employees': [147000,267937,135301,878429,163000,197000,158000,58604,
                      109700,350864,110600,364800,85858,243540],
        'Sector': ['Consumer Electronics','Consumer Electronics','Software Services',
                   'Chip Manufacturing','Software Services','Consumer Electronics',
                   'Consumer Electronics','Software Services','Consumer Electronics',
                   'Consumer Electronics','Chip Manufacturing','Software Services',
                   'Software Services','Consumer Electronics'],
        'Founding Date':['01-04-1976','13-01-1969','04-09-1998','20-02-1974',
                         '04-04-1975','15-09-1987','01-02-1984','04-02-2004',
                         '07-04-1946','01-01-1910','18-07-1968','16-06-1911',
                         '11-11-1998','07-03-1918'],
        'Country':['USA','South Korea','USA','Taiwan','USA','China','USA','USA',
                   'Japan','Japan','USA','USA','China','Japan']} 
index = ['Apple','Samsung','Alphabet','Foxconn','Microsoft','Huawei',
         'Dell Technologies','Meta','Sony','Hitachi','Intel','IBM',
         'Tencent','Panasonic']

# Creating a dataframe with our data 
df = pd.DataFrame(data, index)

# Let's see our dataframe
df

### Basic Navigation & Browsing Techniques 

a) `head()`

# First 5 rows by default
df.head()

# df.head(n)
df.head(2)

b) `tail()`

# Last 5 rows by default
df.tail()

# df.tail(n)
df.tail(3)

c) `info()`

 df.info()

d) `shape`

 df.shape

e) `describe()`

 df.describe().T

f) `nunique()`

 df.nunique()

g) `isnull()`

 df.isnull()

# Let's see what happens when we apply sum()
df.isnull().sum()

**Check you knowledge:** 

##### 1. Output the first four rows of the df

head_first_4 = df.head(4)

##### 2. Output the last six rows of the df

tail_last_6 =df.tail(6)

### Column Selection 

Let's now look into how we can select specific data.

**a) Select One Column**

#name_of_df["name_of_column"]

df['Revenue']

**b) Select One Column and Apply Methods**

With `numeric` datatypes:

# Find the lowest revenue 
df['Revenue'].min()

# Find the highest revenue
df['Revenue'].max()

# Find the average revenue
df['Revenue'].mean()

# Find the average revenue rounded to the nearest whole number 
round(df['Revenue'].mean())

# Find the median revenue rounded to the nearest whole number   
round(df['Revenue'].median())

With `string` datatypes:

df['Sector'].min()

df['Sector'].max()

df['Sector'].count()

df['Sector'].nunique()

**c) Select Multiple Columns**

In order to select multiple columns in our dataframe we have to create a list.

# ERROR?!
df['Revenue', 'Employees','Country']


# Ensure you use double square brackets [[]]
df[['Revenue', 'Employees','Country']]

# We can save this dataframe to another dataframe
df_new = df[['Revenue', 'Employees','Country']]

# Now when we call our new dataframe 
df_new

# While our original dataframe remains the same
df

**d) Select Multiple Columns**

Similar to the series, we can apply methods on our new dataframe.

new_df = df[['Revenue', 'Employees']] # we removed the Country column as it is not a numerical column
new_df.mean()

**Check you knowledge:** 

##### 3. Output a `Series` with the column `Employees`

employees_s = df["Employees"]

##### 4. Output the median `Employees` to the nearest whole number

employees_median = round(df["Employees"].median())

##### 5. Output the mean for columns `Revenue` and`Employees` to the nearest whole number

r_e_mean = round(df[["Revenue","Employees"]].mean())

### Selection by Index - `loc` 

loc[row_label, column_label]

# Find the revenue for Samsung 

# loc[row_label, column_label]

df.loc['Samsung','Revenue']

Notice if we use `:` in place of `row_label`, it will return all the data from the specified column.

Thus, we have a `Series`

# loc[row_label, column_label]

df.loc[:,'Revenue']

 Let's now use `:` in place of `row_label` or `column_label`

# row_label
df.loc['Samsung',:]

# column_label
df.loc[:, 'Revenue']

Let's select a `list` of values this time:

# Multiple columns
df.loc[['Apple','Samsung','Sony'], 'Revenue']

# Multiple rows
df.loc['Apple', ['Employees','Country']]

rows = ['Apple','Samsung','Sony']
columns = ['Employees','Sector','Country']

# loc[row_label, column_label]

df.loc[rows,columns]

Slicing `start:stop:step`

a) With `columns`

df.loc['Apple', 'Employees':'Founding Date']

b) With `rows`

df.loc['Apple':'Sony', 'Employees']

c) With `step`

df.loc['Apple':'Sony':2, columns]

d) With `step` and `:`

df.loc['Apple':'Sony':2, :]

**Check you knowledge:** 

##### 6. Using Index Selection, select the `Revenue`, `Employees` & `Sector` for the companies `Apple`, `Alphabet` and `Microsoft`

**Include a `step` value in your output.**

### Selection by Position - `iloc` 

`iloc[row_position, column_position]`

The following output's for the code below will be the same as the `loc` examples.

# Find the revenue for Samsung 
df.iloc[1, 0]

Notice if we use `:` in place of `row_position`, it will again return all the data from the specified column.

Thus, we have a `Series`

df.iloc[:,0]

Let's now use `:` in place of `row_position` or `column_position`

# row_position
df.iloc[1,:]

# column_position
df.iloc[:,0]

Let's select a `list` of values this time:

# Multiple columns
df.iloc[[0,1,8], 0]

# Multiple rows
df.iloc[0, [1,4]]

rows_i = [0,1,8]
columns_i = [1,2,4]

df.iloc[rows_i,columns_i]

Slicing `start:stop:step`:

a) With `columns`

df.iloc[0, 1:4]

b) With `rows`

df.iloc[0:8, 1]

c) With `step`

df.iloc[0:9:2, columns_i]

d) With `step` & `:`

df.iloc[0:9:2, :]

**Check you knowledge:** 

##### 7. Using Position Selection, select the `Revenue`, `Employees` & `Country` for the companies `Samsung`, `Foxconn` and `Huawei`.

position_selection = df.iloc[[1,3,5],[0,1,-1]]

### The End!