## Coding Exercise

I have a list of tuples that contain two numbers each. How can I programmatically "flatten" this list. That is how can I turn the following two-dimensional list:

```python
[(1, 2), (3, 5), (4, 9), (8, 1), (3, 4)]
```

Into the following one-dimensional list:

```python 
[1, 2, 3, 5, 4, 9, 8, 1, 3, 4]
```

Assume the following:
* Our list of tuples is called `tups`.
* There will always only be 2 elements in each tuple.
* While we have example data, our list will not always have the same length (you cannot use `for i in range`)

In [7]:
tups = [(1, 2), (3, 5), (4, 9), (8, 1), (3, 4)]

# write code here


[1, 2, 3, 5, 4, 9, 8, 1, 3, 4]

## Solution & Explanation

To "unfold" this 2D data-structure, we must firstly loop through our list of tuples.

```python
for t in tups:
    print(t)
```

At this iteration of our program, we will be able to get each tuple and assign it to the variable `t`. This association allows us to perform discrete methods on the tuple data-structure located inside of this list.

From there we can simply append the first and last element of `t` to a new empty list.

```python
newlst = []
for t in tups:
    newlst.append(t[0])
    newlst.append(t[1])
```

In [None]:
newlst = []
for t in tups:
    newlst.append(t[0])
    newlst.append(t[1])

## Pandas

So far we’ve learned about various Python packages. These packages allow us to not “reinvent” the wheel and whip up functional code in the least amount of time & effort.

One of those packages is Pandas (Panel Data). Developed by researcher while working at financial research firm.

What does pandas allow us to do?
* Data manipulation, visualization, analysis all in one package.
* No more `csv` module!
* Low barrier of entry for maximum functionality
* Same data manipulation we do in Excel & SQL but in Python

Install pandas in your terminal via
* `pip3 install pandas` (MacOS)
* `pip install pandas` (Windows)


## OOP Review

In its simplest form, Python is just a sequence of methods being called on objects.
This is the core of Object Oriented Programming.

```python
x = []
x.append(3)
x.remove(3)
```

If we understand this pattern, we can apply it to any other package and even other OOP programming languages. 

Namely, we can apply this pattern to pandas in order to do complex operations using just pre-built methods.

In pandas our most commonly used object is a DataFrame

## DataFrames

DataFrames are your source of data that you will transform, mutate, split, and feed into machine learning models.
To create our DataFrame we have a number of options.
* Load in from csv file
* Create from dictionary
* Create from list

## List of Lists

This dataframe will describe orders from a website with columns `OrderID`, `Item`, `Quantity`.

In [None]:
# Firstly, let’s create a list of 3 lists:


# Next, let’s create a list of column names.


# Let’s combine these two data structures into a DataFrame to get the following data-structure.


# set its index.


## Mutating a DataFrame

Notice how this did not actually change our dataframe. What must be going on? 

For some functions, we must specify `inplace = True` for our changes to stick to the original dataframe.

This shows that when we are mutating our dataframes. We must do one of the two:
* Save your mutated dataframe into a new variable (preferably of a different name). 
* Specify `inplace = True`. We only do this for small mutations that are related to our set-up.


We’ll see this come up again.


## Dictionary 

This dataframe will be the same as before, orders from a website with columns `OrderID`, `Item`, `Quantity`.

In [None]:
# Firstly, let’s create a dictionary where each column is a string key, and each value is a 
# list describing the data in our column.


# We simply pass this dictionary into our DataFrame.


# how do we set its index?



## CSV

Most often, we find ourselves simply loading in data from an already present CSV file.

In [None]:
# load in csv file



## Why bother with first two?

If we are almost always loading in csv files, why do we even bother with the other approaches (create from list of lists, dictionary)?
* Sometimes we create are creating a DataFrame from another non-csv data source (API, bs4)
* Sometimes we are building a DataFrame from another DataFrame
* Base Answer: Knowing a variety of data representations make us more powerful data engineers.
* Shallow Answer: Employers expect us to.

## MetaData

What is metadata? Essentially, it is another word for the statistics of our dataset.
Ex: How many rows do we have? How many columns? 

We have a plethora of functions based on our needs.
* Get dimensions of dataframe.
* Get column names of dataframe.
* Get data types of dataframe.
* Get summary statistics of dataframe

In [None]:
# print out dimensions of your data


# print out columns of your data


# get info on your data


# get summary statistics on your data



## Features & Samples

Features indicate the details of a sample. Details are encapsulated in columns.

Samples indicate discrete and separate data points. Each row describes a sample.

## Accessing Data

One of the greatest utilities of pandas is that we can access data within a DataFrame using the same principles that have been guiding us so far.
* Indexing
* Key-Value Accessing
* Looping


## Indexing

Using index position to get the values of a dataframe.

Using the same principle as `lists`, we can access rows using the `loc` function & square bracket notation!

Keep in mind however that we are interacting with a 2D dataset, so it would be nice if we can use two dimensions to access data. (RC Cola)

In [None]:
x = ["Hyundai", "Honda", "BMW", "Toyota", "Ford", "Chevrolet"]

# how do I get the 3rd element?

In [None]:
import pandas as pd
df = pd.read_csv("../data/top_movies.csv")

# get the 3rd row of this df


# get the 3rd row of the "Movie_Name" column


## Key-Values (Columns)

Just as we access the values of keys in a dictionary, we can access the data in one column by using similar syntax.

In [25]:
import pandas as pd

data = {
    "make": ["Toyota", "Honda", "Ford"],
    "year": ["2021", "2011", "2020"],
    "model": ["Camry XLE", "CR-V EX", "Explorer"],
    "price": [24_000, 11_000, 27_000],
    "status": ["used", "used", "used"]
}

# how do I get the price list?

[24000, 11000, 27000]

In [None]:
import pandas as pd
df = pd.read_csv("../data/top_movies.csv")

# get the "Movie_Name" column



## Looping

Doing these discrete actions and queries for data is great.

However we can do even more operations if we know how to loop through a DataFrame. What does this result in?

In [None]:
for row in df:
    print(row)

Just like a dictionary, this gives only column names.

To get data we have a # of options. For now we will start using `iterrows()`

In [32]:
df = pd.read_csv("../data/top_movies.csv")
# loop using iterrows


The Shawshank Redemption
The Godfather
The Dark Knight
The Godfather Part II
12 Angry Men
Schindler's List
The Lord of the Rings: The Return of the King
Pulp Fiction
The Lord of the Rings: The Fellowship of the Ring
Il buono, il brutto, il cattivo
Forrest Gump
Fight Club
The Lord of the Rings: The Two Towers
Inception
The Empire Strikes Back
The Matrix
Goodfellas
One Flew Over the Cuckoo's Nest
Se7en
Shichinin no samurai
It's a Wonderful Life
The Silence of the Lambs
Cidade de Deus
Saving Private Ryan
La vita è bella
Interstellar
The Green Mile
Star Wars
Terminator 2: Judgment Day
Back to the Future
Sen to Chihiro no kamikakushi
Psycho
The Pianist
Gisaengchung
Léon
The Lion King
Gladiator
American History X
The Departed
The Usual Suspects
The Prestige
Whiplash
Casablanca
Seppuku
The Intouchables
Hotaru no haka
Modern Times
Once Upon a Time in the West
Rear Window
Nuovo Cinema Paradiso
Alien
City Lights
Apocalypse Now
Memento
Raiders of the Lost Ark
Django Unchained
WALL·E
The Lives of 