# BLU03 - Exercises Notebook

In [None]:
import hashlib # for grading purposes
import math
import numpy as np
import pandas as pd
import requests
import sqlalchemy

from bs4 import BeautifulSoup

## Part A - SQL exercises

### Querying a StockDatabase with a SQL client

Open your favorite SQL client and connect to the StockDatabase.
The connection settings are the following.

* host: batch4-s02-db-instance.ctq2kxc7kx1i.eu-west-1.rds.amazonaws.com
* port: 5432
* user: ldsa_student
* database: s02_db
* schema: public
* password: XXX (shared through slack)

This is a different schema than the one we used in the learning notebooks (don't forget to change to this schema, see the Learning Notebook). This schema contains information about stock tickers, including their location, some financial information, and whether they are in certain indices.

The tables in this schema are the following:

1. Stock: has information on ticker, stock name, and sector and industry information.
2. Financial: contains latest price and marketcap of all tickers.
3. Location: contains information about where the company is located.
4. Info: contains information about if a given company is in a certain index (SP500 for example).

You can preview these tables using the SQL client.

## Note:

Since some of the table and column names are reserved names, so they have to be written in double quotes in
the queries.  For example in the Financial table (as financial), you can call the Name column using
`financial."Name"`

### Q1. What is the name of the company with ticker T

Write a query that selects the name of the company that has a ticker T, and run it in the SQL client.

Then, assign the result to variable q1_answer (just copy and paste the name you obtained).

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
expected_hash = '3374aaf6d4a7add286557a7003ae41a6b0b2eac3635283fcffa6ddcb1c81af56'
assert hashlib.sha256(q1_answer.encode()).hexdigest() == expected_hash

### Q2. Count how many companies have a marketcap greater than 100 (Billion)

Write a query that counts the number companies have a marketcap (`marketcap`) greater than 100 (Billion)

Then, assign the result to variable q2_answer (just copy and paste the value).

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
expected_hash = '349c41201b62db851192665c504b350ff98c6b45fb62a8a2161f78b6534d8de9'
assert hashlib.sha256(str(q2_answer).encode()).hexdigest() == expected_hash

### Q3. Find the name of the financial sector company which is not in SP500 with the largest marketcap.

That's quite a lot to ask!

Let's break it down. Write a query that:

* Finds where the company is not in the SP500 (note this is a boolean with 0 meaning not in)
* Filters only based on the 'Financial' sector
* Sorts by marketcap and gets the highest.

Then, assign the result to variable q3_answer.


In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
expected_hash = 'a7f4a1723bf1798a9cc8d89ca7d817e414afaeadfe8f8bebe1cbfc591901fcbf'
assert hashlib.sha256(q3_answer.encode()).hexdigest() == expected_hash

### Q4. Find what Sectors have stocks in the SP500 that are located in the state of Louisiana (LA)

Write a query that gets the name of unique Sectors for stocks in the SP500 that are located in the state of LA.

Order the results **by marketcap** in descending order (meaning the sector that contains the company with the highest marketcap is first). Create a list with the results, and assign it to variable q4_answer.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
expected_hash = 'e3468dd18bd5875af0eb0ca93b8d4857d92ffa6aaa40606bc8c673466addf1e9'
assert hashlib.sha256(str(q4_answer).encode()).hexdigest() == expected_hash

### Q5. Find out what sector had the highest average stock price (`lastprice`)

Write a query to find out what sector has the highest average stock price

Assign this sector to variable q5_answer_1.

Also find out the max price in that sector, and assign that value to q5_answer_2.


In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
expected_country_hash = '34a4ace130c47bbe338b0bec84f81a50759816791386555ddaccfa747c2cfa11'
assert hashlib.sha256(q5_answer_1.encode()).hexdigest() == expected_country_hash, "Wrong sector!"

expected_matches_hash = '9e88bc436d06d631daa2e31b55cf86a03e210af27c17c9723eef82a5003febd7'
assert hashlib.sha256(str(q5_answer_2).encode()).hexdigest() == expected_matches_hash, "Wrong max price!"

## Part B - Public APIs


-----------------------------------

In this exercises, the goal is to get data from a public API. We'll go full geek, and use a Pokemon API hosted by the LDSA for this BLU! (credit for the data goes to user `fanzeyi`on Github)

The base URL of the API is the following: https://pokemon-api.lisbondatascience.org/

In order to complete the exercises, you'll have to navigate to the API's documentation (`ui` endpoint) on your browser. More specifically, you'll have to learn what are the different endpoints from which you can GET information.

<br>

<img src="media/api-image.jpg" width=600>

<br>

### Q6. Find all of Charmander's evolutions!

As you might know, Pokemon evolve as they grow. Several Pokemon keep a similar name when they evolve. Let's consider my favourite starter Pokemon, Charmander:

<br>

<img src="media/charmander.png" width=300>

<br>

Use the API to find all Charmander's evolutions! You will have to get all Pokemon with `Char` in their name, and you'll also have to filter for "Fire" type Pokemon, since there are a couple of results unrelated to Charmander.

Extract their names from the `["name"]["english"]` attribute of each result, in the order they are returned, and assign the resulting list to the `q6_answer_names` variable.

Also extract their speeds (`["base"]["Speed"]`) and assign them to variable `q6_answer_speeds`

In [None]:
# Do an HTTP GET request to the Pokemon API to get information about 
# all Pokemons with "Char" in their name
# response = ...
# q6_answer_names = ...
# q6_answer_speeds = ...

# YOUR CODE HERE
raise NotImplementedError()



In [None]:
assert type(q6_answer_names) == list, "Names must be in a list"
assert type(q6_answer_speeds) == list, "Speeds must be in a list"

names_hash = '4530988a30da58ce7b0045234c8499b1cc5bbf39412591a28bc49b876dba223c'
assert hashlib.sha256(str(q6_answer_names).encode()).hexdigest() == names_hash, "Wrong names!"

speeds_hash = 'e0919cb78353fd21778684cebea362f41ccaa283ce2aa8d86a190ccc9daec2aa'
assert hashlib.sha256(str(q6_answer_speeds).encode()).hexdigest() == speeds_hash, "Wrong speeds!"

### Q7 Find the strongest Pokemon moves!

Now, use a different endpoint to find out which Pokemon moves have a `power` stat of 200 or higher.

Extract their `enames` (english names) and assign the resulting list to variable `q7_answer`.

In [None]:
# Do an HTTP GET request to find which Pokemon moves have 200 or more power.

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert type(q7_answer) == list, "Moves must be in a list!"
assert len(q7_answer) == 2, "Wrong number of moves!"

expected_moves_hash = 'a7b4c8bc5e6e205ab29e8255537e3bb8ae04269b5b329a3e7c5984ff45542df1'
assert hashlib.sha256(str(q7_answer).encode()).hexdigest() == expected_moves_hash

## Part C - Web scraping

In this exercise, we're going to use web scraping to get data from the page of a former LDSA student, Bork Pawson!
Bork has kindly made his very simple and amateurish website available for us to scrape!

You can find his website here: https://s02-infrastructure.s3.eu-west-1.amazonaws.com/ldsa-bork/index.html

### Q8. Scrape Bork's ABSOLUTE favourite things in the world.

Bork has written down his five favourite things in the world. You can find them in a list on the website's sidebar.
Scrape the 5 items in order, using the `requests` and `BeautifulSoup` library, store them in a list, and assign it to the `q10_answer` variable. No cheating! 


In [None]:
# Assign the URL of the page to be scraped to variable url
# url = ...
# YOUR CODE HERE
raise NotImplementedError()

# Do a GET request to get the page content, using the url we've just defined
# response = ...
# YOUR CODE HERE
raise NotImplementedError()

# Instantiate a soup object using the response of the GET request
# YOUR CODE HERE
raise NotImplementedError()
    
# Now it's the tricky part!
# Parse the soup in order to retrieve the list of things.
# In the end, store the favourite things in a list and assign it to variable q10_answer.
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
expected_hash = 'b42c63516b06440a9481cbfbc100f23f2b47f68a008f3e073d6e67ce81a6b81e'
assert hashlib.sha256(str(sorted(q8_answer)).encode()).hexdigest() == expected_hash

### Q9. Find the tennis ball tag

Scrape the tag containing the tennis ball image that is on the center of the grid with Bork's favourite things.
Assign the tag (not the image content) to variable `q9_answer`.

Note: You'll have to find a different way to pass the attribute you want to filter, since the attribute name conflicts with an argument of the `find` function. You can figure out how to do this in the [BeautifulSoup documentation](https://beautiful-soup-4.readthedocs.io/en/latest/index.html?highlight=find#the-keyword-arguments)!

In [None]:
# Assign the URL of the page to be scraped to variable url
# url = ...
# YOUR CODE HERE
raise NotImplementedError()

# Do a GET request to get the page content, using the url we've just defined
# response = ...
# YOUR CODE HERE
raise NotImplementedError()

# Instanciate a soup object using the response of the GET request
# YOUR CODE HERE
raise NotImplementedError()

# Parse the soup in order to retrieve the tag of the tennis ball image.
# Assign it to variable q11_answer.
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
expected_hash = '369917cf8ea4d7906841cb6e6c264b124911e6d805bd122a23ffcee8fcb67de7'
assert hashlib.sha256(str(q9_answer).encode()).hexdigest() == expected_hash