# Create your own table

In [20]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://www.brickshop.eu/components/com_virtuemart/shop_image/product/LEGO_31011_Vlieg_51821171a4509.jpg", width = 400)

**In this Notebook we will**
- Create new dataframe with information about planes
- Enrich dataframe with new data
- Send data to our database

The first steps are done in the usual code-along form.  
Then it is your turn to add data and send it to the database.

In [1]:
# Import all necessary libraries
import pandas as pd
import numpy as np
import requests
from zipfile import *
from configdef import *
from sqlalchemy import exc #SQLAlchemy provides a nice “Pythonic” way of interacting with databases.
from sqlalchemy import event

# 1. Set up a connection 

Again we start with connecting to our sql database.

In [2]:
# Establish db connection

# Get connection details from configdef file into a list
params = config(section='postgres')

# Use sql alchemy to create connection to database, which is contained within the engine object
engine = pg_engine_connection(**params)

# Cleans up unnecessary database connections
engine.dispose()

Postgres Database connection successful


# 2. Build new dataframe with information about planes 

Within our nyflights table, there is one column "tailnum", which is the identification number painted on an aircraft.  
We will use the unique tail numbers as plane identifier.  
We can access it by querying the column and afterwards, store it in a dataframe for further processing:

In [3]:
planes = engine.execute('select distinct(tailnum)from nyflights').fetchall()

In [4]:
# Create a pandas dataframe with queried data
df_planes = pd.DataFrame(planes, columns = ["tailnumber"])
df_planes.dropna(inplace = True)

In [5]:
df_planes.head()

Unnamed: 0,tailnumber
1,N8647A
2,N342DN
3,N665NK
4,N585NN
5,N904WN


# 3. Enrich dataframe with new information

Now we can add more columns to our dataframe.  
First, let's create a new column which randomly adds the manufacturer to each entry.

In [6]:
# np.random.choice generates a random sample from a given 1-D array
def choose_mfg():
    return np.random.choice(["airbus", "boeing"])
choose_mfg()

'airbus'

In [7]:
# Use np.random.choice function with a lambda function to apply it to every row of our dataframe.
df_planes['manufacturer'] = df_planes.apply(lambda x: choose_mfg(), axis=1)

In [8]:
df_planes.head()

Unnamed: 0,tailnumber,manufacturer
1,N8647A,airbus
2,N342DN,boeing
3,N665NK,airbus
4,N585NN,boeing
5,N904WN,airbus


Another option is to use available data out of the nyflights data and build f.e. aggregated measures for our plane-dataframe.  
Let's select the minimum flight date for each plane for having infos about its "maiden flight".

In [9]:
# Select the minimum flightdate per tailnumber
first_flight = engine.execute('select tailnum, min(FlightDate) from nyflights group by tailnum').fetchall()

In [10]:
# Create a pandas dataframe with queried data
df_firstflight = pd.DataFrame(first_flight, columns = ["tailnumber", "first_flight"])
df_firstflight.dropna(inplace = True)
df_firstflight.head()

Unnamed: 0,tailnumber,first_flight
0,215NV,2018-01-01
1,216NV,2018-01-01
2,217NV,2018-01-01
3,218NV,2018-01-01
4,219NV,2018-01-01


Having sucessfully stored our first_flight info in a dataframe, we now need to merge our two distinct dataframes on the common column "tailnumber".

In [11]:
df_planes = pd.merge(df_planes, df_firstflight, how="left", on="tailnumber")

In [12]:
df_planes.head()

Unnamed: 0,tailnumber,manufacturer,first_flight
0,N8647A,airbus,2018-01-01
1,N342DN,boeing,2018-09-01
2,N665NK,airbus,2018-01-01
3,N585NN,boeing,2018-01-01
4,N904WN,airbus,2018-01-01


## Task

1. Create 3+ new columns for the planes table. Think about other information you want to add that would be useful in the data analysis later.  
Either create random, fake data or use data out of the available nyflights data, as shown above.  
As an example, infos about max speed might be interesting or some fake features about the appearance or performance of the plane.

2. Merge it to one dataframe. Check that there are no duplicates or other problems that may have crept in through aggregating and joining.

3. Send your data to a new table in the database.  
Use the to_sql method we have used in these notebooks.
**Important: Give the table you send your data a distinct name (f.e. planes_your_name), so that you can find it later on.**