# Part 3
*Explain to us how we can use your system. We should be able to run this system and query it on our own computers using your instructions. Please explain and justify any important design choices you make.*

# Overview

The code for this project is generally split into 3 parts, each with its own docker container.
* *MongoDB* This is the NoSQL database which will contain the source data
* *API server* This is the server that interfaces the user with the database
* *Jupyter Lab* This is a jupyter lab server that makes it easier to explore the data and develop data transformation functions quickly.

The *MongoDB* container is derived from the [official community MongoDB Docker image](https://hub.docker.com/r/mongodb/mongodb-community-server). The API server is derived from code I commonly use for putting together Flask APIs and other web services and the version I used for this project can be found in the `api_server` folder. The jupyter lab service is derived from the [`Jupyter Docker Stacks` `scipy-notebook` image](https://jupyter-docker-stacks.readthedocs.io/en/latest/). All of the files pertaining to building and running each container are contained within the `api-server` and `notebooks` folders.

# Building the Docker Image and Starting the containers
From the root directory containing `compose.yml` and with the docker engine running, run

`docker compose up --build`

Once this is done, you can access the following services at the following addresses
* `localhost:5001/query` - The API server can be reached at `5001` of your `localhost` allowing you to use postman or the shell to query the API
* `localhost:8888` - This is the jupyter server, you can log in without a password for this version of the code
* `localhost:8081` - Here you can access the MongoDB administrator panel

# Querying the Database to answer each question for the term `love`

In [13]:
import pandas as pd
import requests
import json

In [14]:
import requests
import json

# If running this notebook inside the Jupyter Lab container
url = "http://api:50505/query"

# comment this in instead if you are running out itside of that container, but on your local system
# url = "http://localhost:5001"

payload = json.dumps({
  "term": "love"
})
headers = {
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

data = response.json()
print(data)

{'avg_likes_per_tweet': 163, 'counts_by_day': [{'count': 188, 'date': '2022-01-04'}, {'count': 1699, 'date': '2022-01-05'}, {'count': 30, 'date': '2022-01-22'}, {'count': 23684, 'date': '2022-03-01'}], 'place_ids': [], 'term': 'love', 'time_to_complete_query': 1.3691749572753906, 'times_of_day': {'afternoon': 17, 'evening': 201, 'morning': 25383, 'overnight': 0}, 'users': 19579}


# Answering the Questions
* How many tweets were posted containing the term on each day?
* How many unique users posted a tweet containing the term?
* How many likes did tweets containing the term get, on average?
* Where (in terms of place IDs) did the tweets come from?
* What times of day were the tweets posted at?
* Which user posted the most tweets containing the term?

## How many tweets were posted containing the term on each day?

In [16]:
for record in data['counts_by_day']:
    print(f"{record['count']} tweets contained the term on {record['date']}")

188 tweets contained the term on 2022-01-04
1699 tweets contained the term on 2022-01-05
30 tweets contained the term on 2022-01-22
23684 tweets contained the term on 2022-03-01


## How many unique users posted a tweet containing the term?

In [17]:
print(f"{data['users']} posted a tweet containing this term")

19579 posted a tweet containing this term


## How many likes did tweets containing the term get, on average?

In [18]:
print(f"{data['avg_likes_per_tweet']} likes per average tweet containing this term")

163 likes per average tweet containing this term


## Where (in terms of place IDs) did the tweets come from?
Note: Most `place_ids` were null

In [22]:
if len(data['place_ids']) == 0:
    print("There were no place ids associated with tweets containing this term")
else:
    print(f"There were {len(data['place_ids'])} associated with this term")

There were no place ids associated with tweets containing this term


## What times of day were the tweets posted at?

In [26]:
for tod, count in data['times_of_day'].items():
    print(f"{tod}: {count}")

afternoon: 17
evening: 201
morning: 25383
overnight: 0


## Which user posted the most tweets containing the term?

In [27]:
data.keys()

dict_keys(['avg_likes_per_tweet', 'counts_by_day', 'place_ids', 'term', 'time_to_complete_query', 'times_of_day', 'users'])