# <span style="font-size:35px;color:#3665af">Section 1: MongoDB </span>

<hr>
In this section, we will practice how to use MongoDB. <br>

## Pre-reqs:

You need to have your environment set up on Google Cloud working. Please refer to the provided Google Cloud instructions for this setup. <br>



<div style="font-size:30px;color:#3665af;background-color:#E9E9F5;padding:10px;">1. Environment Setup </div>

<br>

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.1. Uploading your files to the cloud </div>

We need to upload the file cities.txt to the bucket. In the following procedure we will refer to the bucket name as **bigdatasystem\_1234\_bucket** so you need to replace that with your bucket name.

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.2. Creating and uploading a Bucket </div>

1. Go to the cloud console.
2. Using the menu select Cloud Storage and then select your bucket. If you don't have one, you need to create one.
3. Drag your file to the bucket. 


![Console](img/Storage.png)


<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.3. Configuring a MongoDB Instance</div>

Google cloud provides two ways to deploy a MongoDB instance. The first way is by creating a MongoDB Cluster. For this lab that option is too costly. The second one is by deploying a container. We will implement this latter approach. 

To do that, first open the Google Cloud console. Go to the web console and select your project, then click on the console icon on the top right corner.

![Console launcher](img/console.png)

__ After you open your console, you should have something like this: __
<hr>

![Console](img/console2.png)

<br>

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.4. Pulling Docker image</div>

To pull the docker image just type and run:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
gcloud docker -- pull launcher.gcr.io/google/mongodb3:latest
</pre>


<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.5. Creating the necessary directories</div>

Just run:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
mkdir -p ~/mongo/data/shard1
mkdir -p ~/mongo/files
</pre>

We need to pull the _cities.txt_ file from the bucket to the console.
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
gsutil cp gs://bigdatasystem_1234_bucket/cities.txt ~/mongo/files/cities.txt
</pre>
    
<div class="alert alert-block alert-info">
<b>NOTE:</b> Remember to use your bucket name
</div>


<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.6. Running docker</div>

To create a MongoDB instance just run this command. 

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
docker run \
  --name server1 \
  -p 27017:27017 \
  -v ~/mongo/data/shard1:/data/db \
  -v ~/mongo/files:/files \
  -d \
  launcher.gcr.io/google/mongodb3
</pre>

- --name sets the name of the docker container 
- -p sets the port for MongoDB, in this case 27017
- -v maps a host directory to the container; e.g. maps the ~/mongo/data/shard1 to the /data/db in the container
- -d indicates the process to run in the container 


**Once you run this command, you should get an hexadecimal id of the image**.<br><br>

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.7. Docker Reference:</div>

[Reference](https://docs.docker.com/) and [Cheat Sheet](https://github.com/wsargent/docker-cheat-sheet)

#### To check which are the dockers containers currently executing 
Run:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
docker ps
</pre>

#### To stop a docker container
Run:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
docker stop \<dockerName\>
</pre>
Example:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
docker stop server1
</pre>

#### To remove (destroy) a docker container
Run:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
docker rm \<dockerName\>
</pre>
Example:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
docker rm server1
</pre>

<div style="font-size:30px;color:#3665af;background-color:#E9E9F5;padding:10px;">2. Connecting to MongoDB </div>


To execute the client run:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
docker exec -it server1 mongo admin
</pre>

You should get the following output:

![mongo client](img/mongoClient.png)

To end the client just run:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px navy;">
quit();
</pre>


- Create the database **mydb** and the collection **cities**:

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px navy;">
#Create the DB if not exists 
use mydb  
#Creates the collection.
db.createCollection("cities")
</pre>

- Verify the existence of the database (mydb) and the database collection (cities):

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px navy;">
show dbs
show collections
</pre>


<span style="color:RED">Paste the output of the previous four commands here. </span>

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">2.1. Load some data</div>


Quit the client so we can use the mongoimport to load data.

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px navy;">
quit();
</pre>

- First let's check that we have the file in the correct directory. 

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
ls -al ~/mongo/files
</pre>
Should list the cities.txt

- We will use the **mongoimport** tool to load documents from a text file. The syntax is:
<pre style="background-color:#dddddd;padding:5px;">
mongoimport  --db <database> --collection <collection> --file <filepath/filename>
</pre>

- So, to execute it through docker just run:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
docker exec -it server1 mongoimport  --db mydb --collection cities --file /files/cities.txt
</pre>

<span style="color:RED">Paste the output here. </span>


### Question 1:
In English describe the content of the database collection. Also, test the following command:

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px navy;">
db.cities.find().pretty()
</pre> 

<span style="color:RED">Include a sample of the result and your description of what that command does below.</span>

<hr style="border-top: 1px solid red; margin-top: 20px; margin-bottom: 1px"></hr>

##### Using what you learned in class and [MongoDB Reference]( https://www.mongodb.com/docs/v4.4/reference/mongo-shell/ ) answer the following queries using the cities collection.

<hr style="border-top: 1px solid red; margin-top: 20px; margin-bottom: 1px"></hr>


### Question 2:
List all the cities of the State of Colorado.

<span style="color:RED">Place your code and a sample of the result here.</span>

In [None]:
To list cities of state of colorado, we can use the below code :-
    > db.cities.find({state:'CO'}).pretty()
    
Sample Output:-
{
        "_id" : "80002",
        "city" : "ARVADA",
        "loc" : [
                -105.098402,
                39.794533
        ],
        "pop" : 12065,
        "state" : "CO"
}
{
        "_id" : "80004",
        "city" : "ARVADA",
        "loc" : [
                -105.11771,
                39.814066
        ],
        "pop" : 33260,
        "state" : "CO"
}
{
        "_id" : "80005",
        "city" : "ARVADA",
        "loc" : [
                -105.109719,
                39.842189
        ],
        "pop" : 22613,
        "state" : "CO"
}
{
        "_id" : "80010",
        "city" : "AURORA",
        "loc" : [
                -104.864618,
                39.736788
        ],
        "pop" : 27090,
        "state" : "CO"
}
{
        "_id" : "80011",
        "city" : "AURORA",
        "loc" : [
                -104.815233,
                39.737809
        ],
        "pop" : 36021,
        "state" : "CO"
}

To just list down the cities and not include the other collection columns , we can use code :-
    > db.cities.find({ state: "CO" }, { city: 1,_id:0}).pretty()
Sample Output:-
{ "city" : "ARVADA" }
{ "city" : "ARVADA" }
{ "city" : "ARVADA" }
{ "city" : "AURORA" }
{ "city" : "AURORA" }
{ "city" : "AURORA" }
{ "city" : "AURORA" }
{ "city" : "ARVADA" }
{ "city" : "AURORA" }
{ "city" : "AURORA" }
{ "city" : "AURORA" }
{ "city" : "AURORA" }
{ "city" : "AURORA" }
{ "city" : "AURORA" }
{ "city" : "BROOMFIELD" }
{ "city" : "WESTMINSTER" }
{ "city" : "COMMERCE CITY" }
{ "city" : "LOUISVILLE" }
{ "city" : "LAFAYETTE" }
{ "city" : "WHEAT RIDGE" }



### Question 3:
List the first 10 cities of the State of Colorado.

<span style="color:RED">Place your code and a sample of the result here.</span>

In [None]:
 To list the first 10 cities of state of colorado:-
        >db.cities.find({ state: "CO" }).pretty().limit(10)
Result:-
    {
        "_id" : "80002",
        "city" : "ARVADA",
        "loc" : [
                -105.098402,
                39.794533
        ],
        "pop" : 12065,
        "state" : "CO"
}
{
        "_id" : "80004",
        "city" : "ARVADA",
        "loc" : [
                -105.11771,
                39.814066
        ],
        "pop" : 33260,
        "state" : "CO"
}
{
        "_id" : "80005",
        "city" : "ARVADA",
        "loc" : [
                -105.109719,
                39.842189
        ],
        "pop" : 22613,
        "state" : "CO"
}
{
        "_id" : "80010",
        "city" : "AURORA",
        "loc" : [
                -104.864618,
                39.736788
        ],
        "pop" : 27090,
        "state" : "CO"
}
{
        "_id" : "80011",
        "city" : "AURORA",
        "loc" : [
                -104.815233,
                39.737809
        ],
        "pop" : 36021,
        "state" : "CO"
}
{
        "_id" : "80012",
        "city" : "AURORA",
        "loc" : [
                -104.837693,
                39.698672
        ],
        "pop" : 37711,
        "state" : "CO"
}
{
        "_id" : "80013",
        "city" : "AURORA",
        "loc" : [
                -104.784566,
                39.657457
        ],
        "pop" : 45335,
        "state" : "CO"
}
{
        "_id" : "80003",
        "city" : "ARVADA",
        "loc" : [
                -105.065549,
                39.828572
        ],
        "pop" : 32980,
        "state" : "CO"
}
{
        "_id" : "80016",
        "city" : "AURORA",
        "loc" : [
                -104.741734,
                39.618713
        ],
        "pop" : 4085,
        "state" : "CO"
}
{
        "_id" : "80017",
        "city" : "AURORA",
        "loc" : [
                -104.788093,
                39.694827
        ],
        "pop" : 25910,
        "state" : "CO"
}

### Question 4:
List the 10 cities of the State of Colorado with largest populations.

<span style="color:RED">Place your code and a sample of the result here.</span>

In [None]:
Top ten cities with largest populations can be derived by below code :-
> db.cities.find({ state: "CO" }).sort({ pop: -1 }).limit(10)

Result:-
{ "_id" : "80123", "city" : "BOW MAR", "loc" : [ -105.07766, 39.596854 ], "pop" : 59418, "state" : "CO" }
{ "_id" : "80221", "city" : "FEDERAL HEIGHTS", "loc" : [ -105.007985, 39.840562 ], "pop" : 54069, "state" : "CO" }
{ "_id" : "80631", "city" : "GARDEN CITY", "loc" : [ -104.704756, 40.413968 ], "pop" : 53905, "state" : "CO" }
{ "_id" : "80219", "city" : "DENVER", "loc" : [ -105.034134, 39.695624 ], "pop" : 48234, "state" : "CO" }
{ "_id" : "80501", "city" : "LONGMONT", "loc" : [ -105.10095, 40.177921 ], "pop" : 47166, "state" : "CO" }
{ "_id" : "80013", "city" : "AURORA", "loc" : [ -104.784566, 39.657457 ], "pop" : 45335, "state" : "CO" }
{ "_id" : "80030", "city" : "WESTMINSTER", "loc" : [ -105.037086, 39.854238 ], "pop" : 43235, "state" : "CO" }
{ "_id" : "80110", "city" : "CHERRY HILLS VIL", "loc" : [ -104.990022, 39.646027 ], "pop" : 40226, "state" : "CO" }
{ "_id" : "80303", "city" : "BOULDER", "loc" : [ -105.239178, 39.991381 ], "pop" : 39860, "state" : "CO" }
{ "_id" : "80906", "city" : "COLORADO SPRINGS", "loc" : [ -104.819893, 38.790164 ], "pop" : 38856, "state" : "CO" }

### Question 5:
List the 10 cities with largest populations.

<span style="color:RED">Place your code and a sample of the result here.</span>

In [None]:
Top ten cities with largest populations can be derived by below code:-
    > db.cities.find().sort({ pop: -1 }).limit(10)
Result:-
{ "_id" : "60623", "city" : "CHICAGO", "loc" : [ -87.7157, 41.849015 ], "pop" : 112047, "state" : "IL" }
{ "_id" : "11226", "city" : "BROOKLYN", "loc" : [ -73.956985, 40.646694 ], "pop" : 111396, "state" : "NY" }
{ "_id" : "10021", "city" : "NEW YORK", "loc" : [ -73.958805, 40.768476 ], "pop" : 106564, "state" : "NY" }
{ "_id" : "10025", "city" : "NEW YORK", "loc" : [ -73.968312, 40.797466 ], "pop" : 100027, "state" : "NY" }
{ "_id" : "90201", "city" : "BELL GARDENS", "loc" : [ -118.17205, 33.969177 ], "pop" : 99568, "state" : "CA" }
{ "_id" : "60617", "city" : "CHICAGO", "loc" : [ -87.556012, 41.725743 ], "pop" : 98612, "state" : "IL" }
{ "_id" : "90011", "city" : "LOS ANGELES", "loc" : [ -118.258189, 34.007856 ], "pop" : 96074, "state" : "CA" }
{ "_id" : "60647", "city" : "CHICAGO", "loc" : [ -87.704322, 41.920903 ], "pop" : 95971, "state" : "IL" }
{ "_id" : "60628", "city" : "CHICAGO", "loc" : [ -87.624277, 41.693443 ], "pop" : 94317, "state" : "IL" }
{ "_id" : "90650", "city" : "NORWALK", "loc" : [ -118.081767, 33.90564 ], "pop" : 94188, "state" : "CA" }

<hr style="border: 1px double blue;" >

## Map-Reduce with MongoDB

We can run map-reduce jobs on Mongodb. Let's count the number of cities per state:

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px navy;">
db.cities.mapReduce(
                    function()           { emit(this.state,1); }, 
                    function(key,values) {return key,Array.sum(values);}, 
                    { out: "citiesPerState" } 
                   )
</pre>

This code will generate a new collection instead of displaying the result. 
Use the commands discussed before to list the collections and to get the information from the new collection.


### Question 6:
<span style="color:RED">Place your code output (map-reduce) here.</span>


### Question 7:
<span style="color:RED">Place the list of collections here.</span>

### Question 8:
<span style="color:RED">Place the content of the new collection here.</span>

### Map-Reduce - Question 9:
Generate a collection called **populationPerState** that contains the population of each state.

<span style="color:RED">Place your code, code output and a sample of the result here.</span>

### Map-Reduce - Question 10:
Generate a collection called **totalPopulation** that contains the entire population of the USA.

<span style="color:RED">Place your code, code output and a sample of the result here.</span>
> Hint: you can use your previous computed collection.

<span style="text-align:center;font-size:30px;color:#2F632A">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;STOP AND DELETE YOUR DOCKER WHEN YOU ARE FINISHED 
</span>
<br>

<hr style="border-top: 5px solid purple; margin-top: 1px; margin-bottom: 1px"></hr>

# <span style="font-size:35px;color:#3665af">Section 2: REDIS </span>
<hr>

The objective of this part of the assignment is to introduce the use of REDIS (an in-memory data store) to collect data from various sources for subsequent data processing. 

For this assignment, we will be using REDIS on Google Cloud and Twitter Python libraries.

<b><u>Notebook Layout (Table of contents):</u></b>
1. Environment Set-up
   - Deployment a REDIS Cluster
   - Network Configuration
   - Enabling Remote Access
   - Installing Python libraries
2. Getting Familiar with REDIS
3. Retrieving Information From Twitter
   - Creating Twitter Credentials
   - Accessing Twitter
   - Saving Tweets to REDIS
   - Retrieving Tweets from REDIS
4. Load Database From CSV


<div style="font-size:30px;color:#3665af;background-color:#E9E9F5;padding:10px;">1. Environment Setup </div>

For this assignment we will be using the REDIS deployment on Google Cloud. Please follow the instructions to set up the cluster. You can opt to install REDIS on your own system. We don't recommend this approach though.<br><br>

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.1. Deployment of a REDIS Cluster </div>

Use the search box, and search REDIS (Google Click to Deploy).

<img src="img/redis_launcher.png" style="width:1000px;">

We will deploy a REDIS cluster using three small-cpu nodes:
- **Name**: redis-1
- **Zone**: us-central1-f
- **Instance Count**: 3
- **Machine Type**: e2-small (2Gb memory)
- **Boot Disk size**: 50GB

All other parameters are set to default values. The cost of this deployment is the same as that of any other Virtual Machine (VM) created on the cloud. Your deployment configuration should be similar to this:<br>

<img src="img/redis_deployment.png" style="width:1000px;">



<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">1.2. Network Configuration </div>

Once you have your cluster deployed, we need to setup a firewall rule to allow us access from our Jupyter-Notebook. To do that, go to the menu, and select **VPC Network**. Then select Firewall Rules and create a new rule as depicted below:

<img src="img/vpc_network.png" style="width:1000px;">



<img src="img/fire_rule2.png" style="width:1000px;">

<img src="img/fire_rule2.png" style="width:1000px;">

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">
    1.3. Enabling Remote Access to REDIS
</div>

After creating the firewall rule, we also need to configure REDIS to allow access from the external network. Follow this procedure to accomplish that:

- Select the Compute Engine in the navigation menu
- Open the SSH terminal to the main server, by selecting SSH on the _redis-1-db-vm-0_ VM (the Compute Engine menu on the Cloud).

<img src="img/redis_vm.png" style="width:550px;">

- Disable the REDIS password:

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
sudo nano /etc/redis/redis.conf
</pre>

Use your arrow keys to scroll down to the line beginning with "requirepass", then comment that line out using "#". 

Press ctrl+x, then hit y, then Enter. 

- Restart REDIS

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
sudo service redis-server restart
</pre>
 
- Launch the REDIS Client on the console 

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
redis-cli
</pre>

- Within the client change the config to disable protected mode

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px navy;">
CONFIG SET protected-mode no
</pre>
You should get an **OK** message. Then type:
<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px navy;">
quit
</pre>

Once this is completed, we should be able to access REDIS through the public ip we get on the VM list

<img src="img/redis_vm_ip.png" style="width:550px;">


<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">
    1.4. Installing Python Libraries
</div>

You need to install two libraries for this assignment: 
- REDIS library, and
- Twitter library.

- Within your notebook environment (click the jupyterhub logo on top left), open a new terminal (New -> Terminal), and run

<pre style="background-color: #ebece4;padding: 10px;border-left: solid 4px orange;">
pip install tweepy
pip install redis
</pre>




<div style="font-size:30px;color:#3665af;background-color:#E9E9F5;padding:10px;">
    2. Getting Familiar with <b>REDIS</b> 
</div>

You can read about the REDIS commands [here](https://redis.io/commands).

In [1]:
##Import the library
import redis

In [2]:
## Connect to the server
REDIS_SERVER = '35.202.201.207'
REDIS_PORT   = 6379
myRedis = redis.StrictRedis(host=REDIS_SERVER, port=REDIS_PORT, db=0)

In [3]:
## Dropping Everything we got on REDIS
display(myRedis.flushdb())
display(myRedis.flushall())

True

True

In [4]:
print("I'm storing a value on the key 'myKey'")
display(myRedis.set('myKey', 'This is the key value'))
print("I'm reading the value of 'myKey' from REDIS")
display(myRedis.get('myKey'))

I'm storing a value on the key 'myKey'


True

I'm reading the value of 'myKey' from REDIS


b'This is the key value'

In [5]:
print("I'm storing a List on REDIS")

print("Adding elements to the end of the list")
display(myRedis.rpush('weekdays','Tuesday'))
display(myRedis.rpush('weekdays','Wednesday'))
display(myRedis.rpush('weekdays','Thursday'))
display(myRedis.rpush('weekdays','Friday'))
print("Current List Length:", myRedis.llen('weekdays'))
display("Current Weekdays Content:",myRedis.lrange('weekdays',0,-1))


print("Adding elements to the beginning of the list")
display(myRedis.lpush('weekdays','Monday'))

print("Current List Length:", myRedis.llen('weekdays'))
display("Current Weekdays Content:",myRedis.lrange('weekdays',0,-1))


I'm storing a List on REDIS
Adding elements to the end of the list


1

2

3

4

Current List Length: 4


'Current Weekdays Content:'

[b'Tuesday', b'Wednesday', b'Thursday', b'Friday']

Adding elements to the beginning of the list


5

Current List Length: 5


'Current Weekdays Content:'

[b'Monday', b'Tuesday', b'Wednesday', b'Thursday', b'Friday']

In [6]:
print("I'm storing a HASH on REDIS")
print("- Remember that hashes can be used to store documents!")

#create a dictionary
user = {"Name"    :"myName", 
        "Company" :"myCompany", 
        "Address" :"myAddress", 
        "Location":"MyLocation"}

print ("Store to REDIS")
display(myRedis.hmset("userDictionary", user))

print ("Retrieve from REDIS")
display(myRedis.hgetall("userDictionary"))


I'm storing a HASH on REDIS
- Remember that hashes can be used to store documents!
Store to REDIS


  display(myRedis.hmset("userDictionary", user))


True

Retrieve from REDIS


{b'Name': b'myName',
 b'Company': b'myCompany',
 b'Address': b'myAddress',
 b'Location': b'MyLocation'}

<div style="font-size:30px;color:#3665af;background-color:#E9E9F5;padding:10px;">
    3. Retrieving Information From Twitter</div>
<br>
<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">
    3.1. Creating Twitter Credentials
</div>

In order to be able to access tweets from our application, we need a Tweeter account, consumer keys and access tokens.

To generates these, go to https://apps.twitter.com
and **Create a New App**. Fill in the form and agree with the terms.

Once that's done, select your app and the tab **Keys and Access Tokens**

<img src="img/twitter.png" style="width:550px;">


<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">
    3.2. Accessing Twitter
</div>


In [None]:
import tweepy
from tweepy import OAuthHandler
 
consumer_key    = 'PLACE_YOUR_KEYS'
consumer_secret = 'PLACE_YOUR_KEYS'
access_token    = 'PLACE_YOUR_KEYS'
access_secret   = 'PLACE_YOUR_KEYS'
 
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
 
api = tweepy.API(auth)

In [None]:
for tweet in tweepy.Cursor(api.home_timeline).items(2):
    # Process a single tweet

    print(tweet._json.keys())
    print()
    print(tweet._json["id"])
    print(tweet._json["text"])
    print(tweet._json["source"])
    print(tweet._json["lang"])
    print(tweet._json["retweeted"])    
    print(tweet._json["retweet_count"])
    print(tweet._json["favorite_count"])
    print()
    

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">
    3.3. Saving Tweets to REDIS
</div>


In [None]:
# Lets save tweets

howManyTweets = 20

for tweet in tweepy.Cursor(api.home_timeline).items(howManyTweets):
    # Process a single tweet
    
    ##Formatting the tweet 
    redisTweet = {
                  "text"           :tweet._json["text"].encode('utf-8'), 
                  "source"         :tweet._json["source"].encode('utf-8'), 
                  "lang"           :tweet._json["lang"].encode('utf-8'), 
                  "source"         :tweet._json["source"].encode('utf-8'), 
                  "retweet_count"  :tweet._json["retweet_count"], 
                  "favorite_count" :tweet._json["favorite_count"]
                 }

    ## Saving the tweet as HASH
    myRedis.hmset(tweet._json["id"], redisTweet)
    #display(tweet._json["id"])
    
    ## Adding the Tweet id to the list of tweets
    myRedis.rpush("tweets",str(tweet._json["id"]))
       
print("Done!")

<div style="font-size:20px;color:#F1F8FC;background-color:#0095EA;padding:10px;">
    3.4. Retrieving Tweets from REDIS
</div>


In [None]:
for id in myRedis.lrange("tweets",0,99):
    print()
    print("Displaying Tweet with ID:",id)
    print("Text:",myRedis.hmget(id,"text"))
    print("ALL DATA:",myRedis.hgetall(id))
    print("========================================================================")

<div style="font-size:30px;color:#3665af;background-color:#E9E9F5;padding:10px;">
    4. Load Database From CSV
</div>

Unzip "CITES Wildlife Trade Database.zip" to get "comptab_2018-01-29 16_00_comma_separated.csv"
### Question 11:
Load this csv file into REDIS and compute the number of animals per Order for each importer. Your output should be similar to this:
<br>

US . Carnivora . XXX
<br>
US . Falconiformes . XXX
<br>
US . Artiodactyla . XXX
<br>
...
<br>
AD . Acipenseriformes . XXX
<br>
AD . Falconiformes . XXX
<br>
AD . Carnivora . XXX
<br>
...

Measure the running time, and present the average running time for the load and the processing operations.

**Please also explain your code**
<br>
Hint: You can download CITES Wildlife Trade Database.zip, unzip it and open comptab_2018-01-29 16_00_comma_separated.csv to see the data

<span style="color:RED">PLACE YOUR ANSWERS/CODE IN CELLS BELOW</span>

In [16]:
import pandas as pd
import redis
import time

# Connect to Redis
REDIS_SERVER = '35.202.201.207'
REDIS_PORT   = 6379
myRedis = redis.StrictRedis(host=REDIS_SERVER, port=REDIS_PORT, db=0)
df=pd.read_csv('comptab_2018-01-29 16_00_comma_separated.csv')
f=df.groupby(['Importer','Order'])['Importer reported quantity'].mean()
start_time = time.time()
#Loading into Redis
for key, value in f.items():
    hash_field = f"{key[0]}.{key[1]}"
    myRedis.hset("animalsperorder", hash_field, value)
#Retrieving rows from Redis
values=myRedis.hgetall("animalsperorder")
end_time = time.time()
decoded_values = {}
for key, value in values.items():
    decoded_key = key.decode()
    decoded_value = value.decode()
    decoded_values[decoded_key] = f"{decoded_value}"
for key, value in decoded_values.items():
    print(f"{key}.{value}")
elapsed_time_toload = end_time - start_time
print(f"Running time to load and process: {elapsed_time_toload:.2f} seconds")

KY.Liliales.nan
KE.Serpentes.nan
NO.Sapindales.nan
AO.Orchidales.nan
TH.Galliformes.nan
CO.Anseriformes.nan
BH.Myrtales.nan
YE.Myrtales.nan
ES.Carnivora.5.823529411764706
TR.Perissodactyla.1.1666666666666667
LK.Crocodylia.nan
JO.Scleractinia.nan
AE.Gentianales.nan
GH.Liliales.nan
LB.Nepenthales.nan
BG.Proboscidea.2.0
DZ.Orchidales.nan
DE.Caudata.20.0
FR.Pholidota.41.0
MU.Serpentes.nan
DZ.Caryophyllales.nan
ZW.Artiodactyla.6.666666666666667
ES.Falconiformes.2.2564102564102564
TN.Ranunculales.nan
SN.Psittaciformes.nan
CO.Cyatheales.nan
KP.Pholidota.nan
UA.Primulales.nan
MW.Crocodylia.nan
CI.Caryophyllales.nan
AG.Sapindales.nan
PK.Sapindales.nan
IL.Nepenthales.nan
IL.Proboscidea.nan
SR.Columbiformes.nan
RE.Euphorbiales.nan
BA.Serpentes.nan
SE.Osteoglossiformes.15.0
XX.Veneroida.nan
CA.Carcharhiniformes.nan
BR.Psittaciformes.nan
SX.Euphorbiales.nan
ES.Fabales.21.174333333333333
TW.Passeriformes.nan
IR.Testudines.nan
IS.Veneroida.nan
SG.Rosales.nan
CH.Laurales.nan
GG.Serpentes.10.0
CY.Cycad

In [18]:
import pandas as pd
import redis
import time

# Connect to Redis
REDIS_SERVER = '35.202.201.207'
REDIS_PORT   = 6379
myRedis = redis.StrictRedis(host=REDIS_SERVER, port=REDIS_PORT, db=0)
df=pd.read_csv('comptab_2018-01-29 16_00_comma_separated.csv')
f=df.groupby(['Importer','Order'])['Importer reported quantity'].mean()
n = 10
start_time = time.time()
for i in range(n):
#Loading into Redis
    for key, value in f.items():
        hash_field = f"{key[0]}.{key[1]}"
        myRedis.hset("animalsperorder", hash_field, value)
#Retrieving rows from Redis
    values=myRedis.hgetall("animalsperorder")
end_time = time.time()
avg_time = (end_time - start_time) / n
print(f"Average time to load and process: {avg_time:.2f} seconds")

Average time to load and process: 42.34 seconds


<div style="font-size:20px;background-color:#BE6D00;color:#F6EFE5;padding:10px;text-align:center;">
STOP YOUR CLUSTER WHEN YOU ARE NOT WORKING<br><br>
ONCE YOU ARE FINISHED, DELETE YOUR CLUSTER
</div>
<br>