### Instructions: 
Download the 'Currency_Continent.csv', 'Currency_Map.csv' and 'ETFs.csv' files and place them in a folder in which you can access them.

---------------

1. a) In the cell below, import the sqlite4 package, the pandas package and rename it as pd and the datetime package and rename it as dt.

In [1]:
import sqlite3
import pandas as pd
import datetime as dt

    b) Create a connection to a SQL database called 'labs.db'.  Then create the cursor object through that connection.

In [2]:
# open connnection to a db file stored locally on disk - if file doesn't exist it is created
connection = sqlite3.connect('lecture.db')

# In order to run SQL commands with sqlite 3, we must create a cursor object
# that traverses the database to run sql commands execute them
cursor = connection.cursor()

    c) Edit the filepaths in the code below if needed to load in the 'Currency_Continent.csv', 'Currency_Map.csv' and 'ETFs.csv' files to SQL tables.

In [3]:
%cd ../
%cd Data
%ls    

#Used to increment
j = 0
index_start = 1

for df in pd.read_csv('Currency_Map.csv', iterator=True, encoding='utf-8'):  
    # Remove spaces from columns
    df = df.rename(columns={c: c.replace(' ', '') for c in df.columns})
    
    #Index the data
    df.index += index_start    
    df.to_sql('Currency_Map', connection, if_exists='replace') # name of SQL table, connection, append
    index_start = df.index[-1] + 1 # update index start
print('done')

#Used to increment
j = 0
index_start = 1

for df in pd.read_csv('Currency_Continent.csv', iterator=True, encoding='utf-8'):  
    # Remove spaces from columns
    df = df.rename(columns={c: c.replace(' ', '') for c in df.columns})
    
    #Index the data
    df.index += index_start    
    df.to_sql('Currency_Continent', connection, if_exists='replace') # name of SQL table, connection, append
    index_start = df.index[-1] + 1 # update index start
print('done')

#Used to increment
j = 0
index_start = 1

for df in pd.read_csv('ETFs.csv', iterator=True, encoding='utf-8'):  
    # Remove spaces from columns
    df = df.rename(columns={c: c.replace(' ', '') for c in df.columns})
    
    #Index the data
    df.index += index_start    
    df.to_sql('ETF_Data', connection, if_exists='replace') # name of SQL table, connection, append
    index_start = df.index[-1] + 1 # update index start
print('done')

/Users/Matthew/Documents/Consulting/Course_Extras/Labs
/Users/Matthew/Documents/Consulting/Course_Extras/Labs/Data
Currency_Continent.csv  ETFs.csv                lecture.db
Currency_Map.csv        Inventory_Data.csv
Demand_Plan.csv         SPY.csv
done
done
done


2. Create a table of your top 5 movies or tv show with when the movie was probably released, your rating out of 10 for it and the genre.  Name this table "Movies".

In [4]:
# Check that we are working with an empty db
cursor.execute("DROP TABLE IF EXISTS Movies;")

# We can define long SQL commands within three quotes

sql_command = """
CREATE TABLE Movies AS
  SELECT "Movie 1" AS Movie, 1994 AS Year, 10 AS Rating, "Genre1" AS Genre UNION
  SELECT "Movie 2", 2014, 9.5, "Genre1" UNION
  SELECT "Movie 3", 1978, 9.4, "Genre2" UNION
  SELECT "Movie 4", 1983, 9.3, "Genre1" UNION
  SELECT "Movie 5", 2012, 9.1, "Genre3" 
  ;"""

# In order to run SQL command on the databse file we have to execute them with the cursor
cursor.execute(sql_command)

# Load the sql table into a pandas dataframe for aesthetics
# The function's arguments are (sql_query_to_run, connection_to_database)
pd.read_sql_query('SELECT * FROM Movies',con = connection)

Unnamed: 0,Movie,Year,Rating,Genre
0,Movie 1,1994,10.0,Genre1
1,Movie 2,2014,9.5,Genre1
2,Movie 3,1978,9.4,Genre2
3,Movie 4,1983,9.3,Genre1
4,Movie 5,2012,9.1,Genre3


3. Create another table related to question 2 but has the movie/tv show name as one column name and another column for character name using python lists and the insert into command.  Please have at least 2 characters for every movie/tv show that us.  Name this table "Characters".

In [5]:
# Check that we are working with an empty db
cursor.execute("DROP TABLE IF EXISTS Characters;")

sql_command = """
CREATE TABLE Characters (
Movie VARCHAR(20),
Character VARCHAR(20))
;"""

cursor.execute(sql_command)

# list of tuples we want to add to the database
movie_data = [ ("Movie 1", "Character1"), \
              ("Movie 1", "Character2"),  \
              ("Movie 2", "Character3"), \
             ("Movie 2", "Character4"), \
               ("Movie 2", "Character5")]

for s in movie_data:
    
    # define string to be formatted, name format placeholders within curly brackets
    format_str = '''
    INSERT INTO Characters (Movie, Character)    
    VALUES ("{m}", "{c}");
    '''

    # define SQL command
    sql_command = format_str.format(m=s[0],c=s[1]) 
    
    cursor.execute(sql_command)
    
connection.commit()
pd.read_sql_query('SELECT * FROM Characters',con = connection)

Unnamed: 0,Movie,Character
0,Movie 1,Character1
1,Movie 1,Character2
2,Movie 2,Character3
3,Movie 2,Character4
4,Movie 2,Character5


4. Select your top 2 rated movies using an order by and limit command from the "Movies" table.

In [6]:
sql_statement = """
SELECT *
FROM Movies
order by Rating Desc
Limit 2
;
"""

pd.read_sql_query(sql_statement,con = connection)

Unnamed: 0,Movie,Year,Rating,Genre
0,Movie 1,1994,10.0,Genre1
1,Movie 2,2014,9.5,Genre1


5. Join the "Movies" and "Characters" tables together by the movie id you used.

In [7]:
sql_statement = """
SELECT *
FROM Movies as a, Characters as b
on a.Movie = b.Movie
;
"""

pd.read_sql_query(sql_statement,con = connection)

Unnamed: 0,Movie,Year,Rating,Genre,Movie.1,Character
0,Movie 1,1994,10.0,Genre1,Movie 1,Character1
1,Movie 1,1994,10.0,Genre1,Movie 1,Character2
2,Movie 2,2014,9.5,Genre1,Movie 2,Character3
3,Movie 2,2014,9.5,Genre1,Movie 2,Character4
4,Movie 2,2014,9.5,Genre1,Movie 2,Character5


6. Using the table you created above, display the count of the movies you had by genre.

In [8]:
sql_statement = """
SELECT Genre,Count(a.Movie)
FROM Movies as a, Characters as b
on a.Movie = b.Movie
Group by Genre
;
"""

pd.read_sql_query(sql_statement,con = connection)

Unnamed: 0,Genre,Count(a.Movie)
0,Genre1,5


7. Using that same table, display the total sum of the ratings and the count of characters for the movies in every genre.

In [9]:
sql_statement = """
SELECT Genre,sum(Rating),Count(Character)
FROM Movies as a, Characters as b
on a.Movie = b.Movie
Group by Genre
;
"""

pd.read_sql_query(sql_statement,con = connection)

Unnamed: 0,Genre,sum(Rating),Count(Character)
0,Genre1,48.5,5


8. Query the fund_yield and fund_name for all PIMCO funds.

In [10]:
sql_statement = """
SELECT fund_name, fund_yield
FROM ETF_data
where fund_family=='PIMCO'
;
"""

pim=pd.read_sql_query(sql_statement,con = connection)
pim

Unnamed: 0,fund_name,fund_yield
0,BOND,3.52
1,CORP,3.39
2,HYS,4.76
3,LDUR,3.07
4,LTPZ,3.08
5,MFDX,2.72
6,MFEM,2.8
7,MFUS,1.77
8,MINT,2.48
9,MUNI,2.57


In [11]:
#for p in pim.columns:
#    print(p)
#pim['fund_family'].unique()

9. Using the Currency_Map table, find the conversion rate from between USD and GBP.  Please note that this table provides USD per unit of foreign currency.  Your final answer should be a number.

In [12]:
sql_statement = """
SELECT USDConversion 
From Currency_Map;
"""
conversion_rate = pd.read_sql_query(sql_statement,con = connection).iloc[0][0]
print(conversion_rate)

1.3


10. Using the previous solution, query fund_name and net_assets from ETF_Data in GBP.

In [13]:
sql_statement = """
SELECT fund_name, net_assets*1.3 as GBP_net_assets
FROM ETF_data
;
"""

pd.read_sql_query(sql_statement,con = connection)

Unnamed: 0,fund_name,GBP_net_assets
0,1305,5.213000e+12
1,1306,1.095900e+13
2,1308,4.849000e+12
3,1309,5.785000e+09
4,1310,2.496000e+09
...,...,...
2347,ZBIO,6.019000e+06
2348,ZIV,1.288040e+08
2349,ZMLP,6.619600e+07
2350,ZROZ,2.596620e+08


11. Query all fund_name and fund_treynor_ratio_5years where fund_treynor_ratio_5years is greater than 2.  Sort the query in ascending order.

In [14]:
sql_statement = """
SELECT fund_name, fund_treynor_ratio_5years
FROM ETF_data
where fund_treynor_ratio_5years > 2
order by fund_name asc
;
"""
pd.read_sql_query(sql_statement,con = connection)

Unnamed: 0,fund_name,fund_treynor_ratio_5years
0,1305,8.01
1,1306,8.03
2,1308,8.03
3,1309,6.32
4,1310,4.08
...,...,...
904,XTL,5.38
905,XTN,5.76
906,YANG,13.04
907,YAO,4.69


12. Create a query that returns the USD conversion and continent for all matching currencies between Currency_Map and Currency_Continent. 

In [15]:
join_statement = """
SELECT * FROM Currency_Map as a, Currency_Continent as b 
WHERE a.Currency=b.Currency;
"""

pd.read_sql_query(join_statement,con = connection)

Unnamed: 0,index,Currency,USDConversion,index.1,Currency.1,Continent
0,1,GBP,1.3,1,GBP,Europe
1,2,EUR,1.18,2,EUR,Europe
2,3,CAD,0.75,3,CAD,North America


13. Find the total of net_assets for all ProShares ETF's.  Your answer should be a number.

In [16]:
sql_statement = """
SELECT SUM(net_assets)
From ETF_data
where fund_family=='ProShares';
"""
total_net_assets = pd.read_sql_query(sql_statement,con = connection).iloc[0][0]
print(total_net_assets)

31397193290.0


In [17]:
connection.commit() 

connection.close()