# BLU03 - Exercises Notebook

In [None]:
import hashlib # for grading purposes
import json
import math
import numpy as np
import pandas as pd
import requests
import sqlalchemy
from bs4 import BeautifulSoup

## Exercise 1 - Querying a StockDatabase with a SQL client

In this exercise, you will write SQL queries to extract information from a database. Start by setting up your SQL client.

Open your favorite SQL client and connect to the StockDatabase in the cloud.
The connection settings are the following:

* host: batch-s02.ctq2kxc7kx1i.eu-west-1.rds.amazonaws.com
* port: 5432
* user: ldsa_student
* database: s02
* schema: exercises
* password: XXX (shared through slack)

This is a different schema than the one we used in the learning notebooks (don't forget to change to this schema, see the Learning Notebook). This schema contains information about stock tickers, including their location, some financial information, and whether they are in certain indices.

You can also connect to the same database in the SQLite format locally, using the file `data/StockDatabase`.

The tables in the StockDatabase are the following:

1. Stock: has information on ticker, company name, and sector and industry information.
2. Financial: contains latest price and marketcap of all tickers.
3. Location: contains information about the company location.
4. Info: contains information about where the company is indexed (SP500 for example).

You can preview these tables using the SQL client.

**Note**:

Some of the column names have to be written in **double** quotes in the queries. For example, the `Company` column in the `financial` table would be written `financial."Company"`. Remember that the single quotes are reserved for writing strings.

### Exercise 1.1 - Ticker of ExlService Holdings, Inc.

Write a query that selects the ticker of the company `ExlService Holdings, Inc.`, and run it in the SQL client.

Assign the result to the variable `answer_1_1` (just copy and paste the ticker name from the query result).

In [None]:
# answer_1_1 = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(json.dumps(answer_1_1).encode()).hexdigest() == '68717397e5ef59739a9d6d62efe5c1588e1632f82847dc9f7da8e250d0383979'

### Exercise 1.2 - Companies located in California

Write a query that counts the number of companies that are located in the state of California - `CA`.

Assign the result to the variable `answer_1_2`.

In [None]:
# answer_1_2 = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(json.dumps(answer_1_2).encode()).hexdigest() == '891d46993a36d78392247c642138cede01d9841daab1d945709755b5194597c4'

### Exercise 1.3 - Texas healthcare companies with the highest lastprice

Find the names of the 3 healthcare sector companies with the highest lastprice located in the state of Texas (`TX`).

Hint: Be careful with the NULL values in `Lastprice`.

The answer should be a list with the three company names, ordered by the lastprice in descending order. Assign it to the variable `answer_1_3`.

In [None]:
# answer_1_3 = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(json.dumps(answer_1_3).encode()).hexdigest() == '74fb763ffbc8a38ed010df043114cb30d2be310583706031fb40a410c2f73f02'

### Exercise 1.4 - Industries with stocks in Nasdaq100 and a total marketcap below 50

Write a query to retrieve the names of industries that have companies with stocks listed in the Nasdaq100 index and the total marketcap of all Nasdaq100 listed companies in the given industry is below 50.

Order the results by the total marketcap in ascending order. Assign the list with the results to the variable `answer_1_4`.

In [None]:
# answer_1_4 = 
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(json.dumps(answer_1_4).encode()).hexdigest() == 'f58eb41b6b2a9ee7fb14edf0a9f81d4436dd7e0fda0a1c6cea4e004ee6d7df1e'

### Exercise 1.5 - Maximum stock price of the state with the highest average marketcap of services

Find out which state has the services sector companies with the highest average marketcap. Assign the answer to variable `answer_1_5_state`.

Also find the maximum lastprice of the services companies in that state, rounded to 2 decimals, and assign the result to `answer_1_5_max_lastprice`.

In [None]:
# answer_1_5_state = ...
# answer_1_5_max_lastprice = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(json.dumps(answer_1_5_state).encode()).hexdigest() == '37dccd18781685047dacce4826241dde6f22949aca3d6a5df9e303c588a4a5e5',\
'The state is not correct.'
assert hashlib.sha256(json.dumps(answer_1_5_max_lastprice).encode()).hexdigest() == '7cc36706980eddc5cb15a663306ccc43dcb9bb2138b2978e3082e2e2e742e700',\
'The maximum lastprice is not correct.'

## Exercise 2 - Public APIs

In this exercises, the goal is to get data from a public API. We'll go full geek, and use a Pokemon API hosted by the LDSA for this BLU! (credit for the data goes to user `fanzeyi` on Github)

The base URL of the API is the following: https://pokemon-api.lisbondatascience.org/

In order to complete the exercises, you'll have to navigate to the API's documentation (`ui` endpoint) in your browser. More specifically, you'll have to understand what are the different endpoints from which you can GET information.

<img src="media/api-image.jpg" width=400>

### Exercise 2.1 - Find all of Pikachu's evolutions!

As you might know, Pokemon evolve as they grow. Some Pokemon keep a similar name when they evolve. Let's consider the most famous Pokemon, Pikachu:

<img src="media/pikachu.png" width=200>

Use the API to find all Pikachu's evolutions! The names of Pikachu evolutions all end in `chu`, so you need to get the pokemons whose names contain that substring. However, you'll also have to filter for "Electric" type Pokemon, since there are a couple of results unrelated to Pikachu.

Extract their ids from the `["id"]` attribute of each result, in the order they are returned, and assign the resulting list to the `answer_2_1_ids` variable.
Also extract their attack scores (`["base"]["Attack"]`) and assign the list to the variable `answer_2_1_attack`.

In [None]:
# answer_2_1_ids = ...
# answer_2_1_attack = ...

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert isinstance(answer_2_1_ids,list), "Ids should be in a list"
assert isinstance(answer_2_1_attack,list), "Attacks should be in a list"
assert hashlib.sha256(json.dumps(answer_2_1_ids).encode()).hexdigest() == 'a160dccf2a5c35ac2760e9846997d5898bd6474af6fe776da487c5ed6b2961e5',\
'The ids are not correct.'
assert hashlib.sha256(json.dumps(answer_2_1_attack).encode()).hexdigest() == '4ab8649e9e835ddbcafe15be29b95ff7994be34434a366ac731add7a53b12921',\
'The attack scores are not correct.'

### Exercise 2.2 - Find the strongest and most accurate Pokemon move!

Now, use a different endpoint to find out which Pokemon moves have an `accuracy` at least 95  and `power` of 200 or higher.

Extract their `enames` (english names) and `types` and assign the resulting lists to variables `answer_2_2_names` and `answer_2_2_types` respectively.

In [None]:
# answer_2_2_names = ...
# answer_2_2_types = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert isinstance(answer_2_2_names,list), "Moves names should be in a list!"
assert len(answer_2_2_names) == 2, "Wrong number of moves!"
assert hashlib.sha256(json.dumps(answer_2_2_names).encode()).hexdigest() == \
'5138c6a92b4189cd87d795605749d8e8e2b629acc5993efff5b41f1f3b6cf63e', 'The moves are not correct.'
assert isinstance(answer_2_2_types,list), "Moves types should be in a list!"
assert len(answer_2_2_types) == 2, "Wrong number of types!"
assert hashlib.sha256(json.dumps(answer_2_2_types).encode()).hexdigest() == \
'a7cca1429c8b722ff02ee415abafaae6df976356ef452b38da51f75902d329b8', 'The types are not correct.'

## Exercise 3 - Web scraping

In this exercise, we're going to use web scraping to get data from the page of a former LDSA student, Bork Pawson!
Bork has kindly made his very simple and amateurish website available for us to scrape.

You can find his website here: https://s02-infrastructure.s3.eu-west-1.amazonaws.com/ldsa-bork/index.html

### Exercise 3.1 - Scrape Bork's AWESOME honourable mentions

Bork has written 3 things that didn't fit in the webpage. You can find them listed on the top of the images.
Scrape the 3 items in order, using the `requests` and `BeautifulSoup` library, store them in a list, and assign it to the `answer_3_1` variable. No cheating! 

In [None]:
# answer_3_1 = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(json.dumps(answer_3_1).encode()).hexdigest() == \
'cbf54d9cf5e5d4e010b45e09b70d576efa5930f934ce31975d2a2d2cedbf12be', 'The list is not correct.'

### Exercise 3.2 - Find the tennis ball tag

Scrape the tag containing the tennis ball image that is in the center of the grid with Bork's favourite things.
Assign the tag (not the image content) to the variable `answer_3_2`.

Note: You'll have to find a different way to pass the attribute you want to filter, since the attribute name conflicts with an argument of the `find` function. You can figure out how to do this in the [BeautifulSoup documentation](https://beautiful-soup-4.readthedocs.io/en/latest/index.html?highlight=find#the-keyword-arguments)!

In [None]:
# answer_3_2 = ...
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
assert hashlib.sha256(json.dumps(str(answer_3_2)).encode()).hexdigest() == \
'cafa2dfc0195df835aee9f9d4437c24ac27af69446a5aa97dc83e73ada3da3b7', 'The tag is not correct.'