## End of Module 2

#### Module 4

### 0. Setup (Mini-Tutorial)

**What is HTTP?**

HTTP (or Hypertext Transfer Protocol) is a protocol which allows the fetching of resources, such as HTML or JSON documents. It is the foundation of any data exchange on the Web and it is a client-server protocol, which means requests are initiated by the recipient, usually the Web browser. A complete document is reconstructed from the different sub-documents fetched, for instance text, layout description, images, videos, scripts, and more.

Source: [An Overview of HTTP (mozilla.org)](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview)

**What is an Application Programming Interface (API)?**

An application program interface (API) is a set of routines, protocols, and tools for building software applications. Basically, an API specifies how software components should interact. Additionally, APIs are used when programming graphical user interface (GUI) components.

Source: [Webopedia](https://www.webopedia.com/TERM/A/API.html)



#### Python Requests Module

Requests is a simple HTTP library for Python. The full documentation is found [here](https://requests.readthedocs.io/en/master/), but for this test, we don't need to go through the whole thing.

To use it, simply import it.


In [77]:
# run this cell
import requests

For this setup, we use a simple API which generates 10 random jokes returned in JSON format.

In [78]:
# run this cell to check if you can call APIs

# First, determine the API URL
api_url = "https://official-joke-api.appspot.com/random_ten"

# Next, invoke the URL via requests.get(url)
data = requests.get(api_url)

# Let's inspect what requests.get() returns
print(data)
print(type(data))

<Response [200]>
<class 'requests.models.Response'>


The call to requests.get(url) returns a **requests.models.Response** object. What you can see from above is the HTTP Response, and if everything is normal, the status code should be **200**.

In [79]:
type(data)

requests.models.Response

HTTP normally returns text which Python can then process as a string. The text body may be found in the Response's attribute **text**.

In [80]:
# run this cell

print(data.text)

[{"id":72,"type":"programming","setup":"I was gonna tell you a joke about UDP...","punchline":"...but you might not get it."},{"id":27,"type":"programming","setup":"['hip', 'hip']","punchline":"(hip hip array)"},{"id":55,"type":"general","setup":"Why is peter pan always flying?","punchline":"Because he neverlands"},{"id":221,"type":"general","setup":"What do you call a group of killer whales playing instruments?","punchline":"An Orca-stra."},{"id":377,"type":"programming","setup":"Knock-knock.","punchline":"A race condition. Who is there?"},{"id":199,"type":"general","setup":"What do I look like?","punchline":"A JOKE MACHINE!?"},{"id":94,"type":"general","setup":"Did you hear about the new restaurant on the moon?","punchline":"The food is great, but there’s just no atmosphere."},{"id":71,"type":"general","setup":"I couldn't get a reservation at the library...","punchline":"They were fully booked."},{"id":275,"type":"general","setup":"What type of music do balloons hate?","punchline":"P

You should see a JSON Array of Jokes. A JSON Array looks like a Python List, but at this point, Python still treats it as a string. To process it in Python, you need to convert it to the equivalent advanced data type via the **json** module. Let's do that next and define the variable **jokes** to contain the data converted from JSON.

In [81]:
# run this cell

import json

jokes = json.loads(data.text)

print("Data:")
print(jokes)
print("Type:")
print(type(jokes))


Data:
[{'id': 72, 'type': 'programming', 'setup': 'I was gonna tell you a joke about UDP...', 'punchline': '...but you might not get it.'}, {'id': 27, 'type': 'programming', 'setup': "['hip', 'hip']", 'punchline': '(hip hip array)'}, {'id': 55, 'type': 'general', 'setup': 'Why is peter pan always flying?', 'punchline': 'Because he neverlands'}, {'id': 221, 'type': 'general', 'setup': 'What do you call a group of killer whales playing instruments?', 'punchline': 'An Orca-stra.'}, {'id': 377, 'type': 'programming', 'setup': 'Knock-knock.', 'punchline': 'A race condition. Who is there?'}, {'id': 199, 'type': 'general', 'setup': 'What do I look like?', 'punchline': 'A JOKE MACHINE!?'}, {'id': 94, 'type': 'general', 'setup': 'Did you hear about the new restaurant on the moon?', 'punchline': 'The food is great, but there’s just no atmosphere.'}, {'id': 71, 'type': 'general', 'setup': "I couldn't get a reservation at the library...", 'punchline': 'They were fully booked.'}, {'id': 275, 'type'

You can see from above that `jokes` is a list of dictionaries. Each dictionary will have various keys. Run the next cell to do a simple list comprehension operation on jokes, displaying the setup and punchline.

In [82]:
[j["setup"] + " " + j["punchline"] for j in jokes]

['I was gonna tell you a joke about UDP... ...but you might not get it.',
 "['hip', 'hip'] (hip hip array)",
 'Why is peter pan always flying? Because he neverlands',
 'What do you call a group of killer whales playing instruments? An Orca-stra.',
 'Knock-knock. A race condition. Who is there?',
 'What do I look like? A JOKE MACHINE!?',
 'Did you hear about the new restaurant on the moon? The food is great, but there’s just no atmosphere.',
 "I couldn't get a reservation at the library... They were fully booked.",
 'What type of music do balloons hate? Pop music!',
 'The punchline often arrives before the set-up. Do you know the problem with UDP jokes?']

### 1 Financial Data API


**(50 Points)**

Stock Market Data

IEX Cloud is a platform that makes financial data and services accessible to everyone.

We shall be using their **sandbox** (playground) API environment for this test. If you wish to play around with real data, feel free to open an account with them at http://iexcloud.io. You will be issued an API token which you will then need to use in your API calls. 

For now, we are provided with a Sandbox Token.


In [83]:
# execute this cell
TOKEN="Tsk_b1e3203fb628428fb3f967dbd3dc2b0b"
finance_url="https://sandbox.iexapis.com/stable/tops?token={TOKEN}&symbols={TICKER}"



#### 1.1.

Define a function `formatted_url` that accepts a stock symbol and returns the formatted URL call.

Hint: use the string method `.format(...)` to be able to use the string template above work.

In [84]:
# define a function formatted_url that accepts a stock symbol and returns the formatted URL call


def formatted_url(symbol):
    # write function code below
    TOKEN="Tsk_b1e3203fb628428fb3f967dbd3dc2b0b"
    finance_url="https://sandbox.iexapis.com/stable/tops?token={}&symbols={}".format(TOKEN,symbol)
    return finance_url
    
# dump string below
print(formatted_url("AAPL"))


https://sandbox.iexapis.com/stable/tops?token=Tsk_b1e3203fb628428fb3f967dbd3dc2b0b&symbols=AAPL


#### 1.2.

Test the API call with your URL.

Sample output using Apple (AAPL) provided below. You may want to try other stock symbols like Amazon (AMZN), Google (GOOG), or Netflix (NFLX).

**NOTE:** The prices below may be different from what you see depending on the time you made the request.

In [85]:
# write code 
api_url = "https://sandbox.iexapis.com/stable/tops?token=Tsk_b1e3203fb628428fb3f967dbd3dc2b0b&symbols=GOOG"
data2 = requests.get(api_url)
print(data)

# dump text attribute of your response object (here, we assume your variable name is data2)
data2.text

<Response [200]>


'[{"symbol":"GOOG","sector":"ionshectlcrseogvey","securityType":"cs","bidPrice":0,"bidSize":0,"askPrice":0,"askSize":0,"lastUpdated":1599444216220,"lastSalePrice":1540.68,"lastSaleSize":101,"lastSaleTime":1602518811402,"volume":37439}]'

### 1.2 Get a list of stocks

Get stock prices of the following:
* Apple (AAPL)
* Amazon (AMZN)
* Google (GOOG)
* Netflix (NFLX)

The symbols are already in the list variable `portfolio`.


In [86]:
# execute this code
portfolio = ["AAPL","AMZN","GOOG","NFLX"]

The URL format is like so:

https://sandbox.iexapis.com/stable/stock/market/batch?symbols=aapl,fb&types=quotetoken=Tsk_b1e3203fb628428fb3f967dbd3dc2b0b



In [87]:
# execute this code
TOKEN="Tsk_b1e3203fb628428fb3f967dbd3dc2b0b"
market_url="https://sandbox.iexapis.com/stable/stock/market/batch?symbols={SYMBOLS}&types=quote&token={TOKEN}"

#### 1.2.

Define a function `market_formatted_url` that accepts a **list of stock symbols** (similar to `portfolio`) and returns the formatted URL call.

Hint: use the string method `.format(...)` to be able to use the string template above work.  
Hint: craft a string with comma-separated symbols based on the portfolio list.  
Hint: research on `str.join(list)` to generate a comma-separated string.  

In [88]:
# write code below
def market_formatted_url(symbols):
    # write the rest of the function code below
    TOKEN="Tsk_b1e3203fb628428fb3f967dbd3dc2b0b"
    market_url="https://sandbox.iexapis.com/stable/stock/market/batch?symbols={}&types=quote&token={}".format(",".join(symbols),TOKEN)
    return market_url
    
# dump string below
print(market_formatted_url(portfolio))

https://sandbox.iexapis.com/stable/stock/market/batch?symbols=AAPL,AMZN,GOOG,NFLX&types=quote&token=Tsk_b1e3203fb628428fb3f967dbd3dc2b0b


#### 1.3.

Retrieve the API data using `portfolio`.


In [89]:
# write code to call the API here
# Hint: use your new function market_formatted_url. 
# Hint: pass the portfolio list already defined for you.

market_data = requests.get(market_formatted_url(portfolio))

# Dump the text attribute of the response object. Sample output provided below (assuming you name your response variable market_data)
market_data.text

'{"AAPL":{"quote":{"symbol":"AAPL","companyName":"Apple, Inc.","primaryExchange":"NQDAAS","calculationPrice":"close","open":null,"openTime":null,"openSource":"iafilcof","close":null,"closeTime":null,"closeSource":"fciaofli","high":null,"highTime":1663841200046,"highSource":"iepn dc edieu5let 1amry","low":null,"lowTime":1611517906698,"lowSource":"dea peniil 1cdtr 5eemyu","latestPrice":386.53,"latestSource":"Close","latestTime":"July 28, 2020","latestUpdate":1656895171008,"latestVolume":null,"iexRealtimePrice":373.99,"iexRealtimeSize":4,"iexLastUpdated":1667151383471,"delayedPrice":null,"delayedPriceTime":null,"oddLotDelayedPrice":null,"oddLotDelayedPriceTime":null,"extendedPrice":null,"extendedChange":null,"extendedChangePercent":null,"extendedPriceTime":null,"previousClose":395.96,"previousVolume":30765000,"change":-6.34,"changePercent":-0.01671,"volume":null,"iexMarketPercent":0.005949609313059389,"iexVolume":155422,"avgTotalVolume":35966922,"iexBidPrice":0,"iexBidSize":0,"iexAskPrice

#### 1.4.

Load the JSON string into a variable named `market_quotes`.

In [90]:
import json

# write code here
# Hint: what attribute of the response object contains the JSON string?
market_quotes = json.loads(market_data.text)


# dump market_quotes (interactive mode). Sample output below
market_quotes


{'AAPL': {'quote': {'symbol': 'AAPL',
   'companyName': 'Apple, Inc.',
   'primaryExchange': 'NQDAAS',
   'calculationPrice': 'close',
   'open': None,
   'openTime': None,
   'openSource': 'iafilcof',
   'close': None,
   'closeTime': None,
   'closeSource': 'fciaofli',
   'high': None,
   'highTime': 1663841200046,
   'highSource': 'iepn dc edieu5let 1amry',
   'low': None,
   'lowTime': 1611517906698,
   'lowSource': 'dea peniil 1cdtr 5eemyu',
   'latestPrice': 386.53,
   'latestSource': 'Close',
   'latestTime': 'July 28, 2020',
   'latestUpdate': 1656895171008,
   'latestVolume': None,
   'iexRealtimePrice': 373.99,
   'iexRealtimeSize': 4,
   'iexLastUpdated': 1667151383471,
   'delayedPrice': None,
   'delayedPriceTime': None,
   'oddLotDelayedPrice': None,
   'oddLotDelayedPriceTime': None,
   'extendedPrice': None,
   'extendedChange': None,
   'extendedChangePercent': None,
   'extendedPriceTime': None,
   'previousClose': 395.96,
   'previousVolume': 30765000,
   'change': -

#### 1.5.

Output the Last Price (`latestPrice`) per Symbol like so:

In [91]:
# Write code here. Sample output provided below but may vary depending on the time you made the request.
# No need to define a function for this step.

for i in market_quotes:
    print(i, "\t", market_quotes[i]["quote"]["latestPrice"])



AAPL 	 386.53
AMZN 	 3134.43
GOOG 	 1517.46
NFLX 	 512.32


Hints:
- use a tab ("\t") between the symbol and the price


### 2 Social Listenening

**(50 points)**

Social listening is the monitoring of a brand, a personality, a cause, or an idea for feedback, direct mentions of entities involved, sentiments, and discussions regarding related keywords (or hashtags), topics, competitors, haters, or industries, followed by an analysis of this data to gain further insights and actionable next steps.

Before we go to the questions, here are a few things not yet taught in class that you will use in your solutions.

Sorting lists using `sort()`

Say you have the following list:

In [92]:
# execute this cell

demolist = ['D','A','F','B','E','C']

Use `sort()` to put elements in order in the **same** list.

In [93]:
# execute this cell 

demolist.sort()
print(demolist)

['A', 'B', 'C', 'D', 'E', 'F']


But supposing you have a list consisting of elements with advanced data types, say, tuples...

In [94]:
demolist2 = [('A',59),('B',100),('C',20),('D',88),('E',25),('F',38)]

... and you wish to sort this list by the number in the second element of each tuple in descending order.

You use `sort()` with two parameters:
* `key` indicates the the function that returns the value that will serve as basis for sorting
* `reverse` takes on a boolean value (default is False) to specify whether the list will be in ascending or descending order

To sort demolist2 by the number indicated as the second element of each tuple, and in descending order, execute the following cell.

In [95]:
# execute this cell

demolist2.sort(key=lambda x: x[1], reverse=True)
print(demolist2)


[('B', 100), ('D', 88), ('A', 59), ('F', 38), ('E', 25), ('C', 20)]


Note that I used lambda above but nothing is stopping you from defining a full-blown function like so:

In [96]:
demolist2 = [('A',59),('B',100),('C',20),('D',88),('E',25),('F',38)]

def sort_tuple(x):
    return x[1]

# note that you only pass the function reference (without the parentheses and parameters)
demolist2.sort(key=sort_tuple, reverse=True)
print(demolist2)

[('B', 100), ('D', 88), ('A', 59), ('F', 38), ('E', 25), ('C', 20)]


### 2.1

Consider a Twitter search for 100 tweets (which is the limit of Twitter) using the keyword "SONA". The resulting JSON dump from the Twitter API call has been pre-saved in the file named `tweets.json`. Let's load the file and convert the JSON string into a list of dictionaries of tweets.

In [97]:
# execute this cell to load the file
tweets_file = 'tweets.json'

with open("tweets.json","r") as json_file:
    tweets = json.loads(json_file.read())

# Dump contents of tweets to make sure things are okay.
tweets

[{'created_at': 'Tue Jul 28 03:25:35 +0000 2020',
  'id': 1287952316029521920,
  'id_str': '1287952316029521920',
  'full_text': 'RT @watchmejayjay: Ay di ka nakanood ng SONA? Wag ka mag-alala, ito yung summary ng mga plano ni tatay laban sa COVID 19: A thread',
  'truncated': False,
  'display_text_range': [0, 130],
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [{'screen_name': 'watchmejayjay',
     'name': 'Kuya Sir #JUNKTERRORLAWNOW',
     'id': 2148463897,
     'id_str': '2148463897',
     'indices': [3, 17]}],
   'urls': []},
  'metadata': {'iso_language_code': 'tl', 'result_type': 'recent'},
  'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>',
  'in_reply_to_status_id': None,
  'in_reply_to_status_id_str': None,
  'in_reply_to_user_id': None,
  'in_reply_to_user_id_str': None,
  'in_reply_to_screen_name': None,
  'user': {'id': 1334271631,
   'id_str': '1334271631',
   'name': 'kai',
   'screen_name': '_kyladel',

You won't really need to understand everything in the tweet object, but in case you want to know more, here is the documentation from the [Twitter Developer Website](https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object).

#### 2.1.1.

Print out the length of the number of tweets. Expected output is shown below.

In [98]:
# write your code here
print(len(tweets))




100


#### 2.1.2.

Define a list variable named `tweet_texts` of tweet texts contained in `tweets`  (which is the value of the `full_text` key).

Do a printout (using **script mode**) of the contents of `tweet_texts`. You will have to loop through each element of the list.

Sample output is provided below.

In [99]:
# write code here.
tweet_texts = [(i["full_text"]) for i in tweets]


# Dump tweet_texts (script mode)
print(tweet_texts)

['RT @watchmejayjay: Ay di ka nakanood ng SONA? Wag ka mag-alala, ito yung summary ng mga plano ni tatay laban sa COVID 19: A thread', 'RT @TOWER_Namba: 【#ジェジュン】\n\n\\カバーアルバムを引っさげたツアー映像🎥/\n\nツアー映像『J-JUN LIVE 2019～Love Covers～』本日入荷しました🙌\n2日間のみのプレミアムライヴを映像とCDでお楽しみいただける豪華盤!!!\n\nCDがあれば…', '@kapilkolotya @kharaa_sona_ Bhai jo le liya hai mene 3saal pehle usko me Boycott kr duga. But\n\nMujy Made in india IPHONE dedo.', 'RT @TOWER_Namba: 【#ジェジュン】\n\n\\ジェジュンの優しく、深い歌声がしみわたる..../\n\nカバーアルバム『Love Covers Ⅱ』本日入荷しました🙌\n原曲の良さをひきたてるだけでなく、ジェジュンらしさも全開のアルバムです♪\n\n🎁タワレコ特典はアナザージャ…', 'RT @JaDineTrash: Her own time. \nHer own thoughts. \nHer own style. \n\nThank you, Nadine. The battles you fight have always been for a bigger…', 'RT @Absolutalbert: VP isn’t invited to SONA? \n\nDesignated Survivor.', 'RT @cnnphilippines: THREAD: Duterte pushed for the reimposition of capital punishment, specifying the method of lethal injection, for heino…', 'RT @quarkhenares: WHY. WHY DO THEY EVEN GET FILMMAKERS TO SHOOT 

#### 2.1.3.

Remove duplicate tweet text lines and save the resulting (cleaned-up) list in the same variable `tweet_texts`.

Hint: using sets may save you a bit of headache.

Sample output is provided below for your reference.

In [100]:
# write code here
tweet_texts = set(tweet_texts)

# dump, assuming variable name is tweet_texts
print(tweet_texts)

# note that after "de-duplication" (removing of duplicates), we should only have 90 tweet text lines left.
print(len(tweet_texts))

{'@serahphymn Totally a 5 🥺 If my sona met him they would totally vibe', 'RT @cnnphilippines: THREAD: Duterte pushed for the reimposition of capital punishment, specifying the method of lethal injection, for heino…', '@Chaiharvest OPEN YOUR PURSE SHOW ME YOUR SONA', 'Difficult step .\nMere liye kar lega rohit....me kar lunga sona ko step pasand he 🙂.\n\n#KahaanHumKahaanTum #Ronakshi \n #KarPika #KaranVGrover #DipikaKakar \nTum jao ghar pe sona mujhe sikha degi 😂😂😂 \nMujhe maza aaya tha 😂😂 https://t.co/aqRFuujs0I', '@Neko_Sona https://t.co/HKJzWDY1x5', 'RT @inquirerdotnet: LOOK: Groups from different sectors gather at UP Diliman, Quezon City on Monday, July 27, to protest hours before the f…', "This health emergency stretched the government’s resources to its limits. In response, the Office of the President worked closely with Congress for the quick passage of the Bayanihan to Heal as One Act.\n\n-President Rodrigo Roa #Duterte's 5th #SONA", 'RT @JaDineTrash: Her own time. \nHer own tho

### 2.2.

Here we build our wordcount statistics.

#### 2.2.1.

Let's store our count stats in a dictionary variable named `words_dict`.

For each line, let's remove special characters contained in the following string:

`"&$@[].,'#()-\"!?’_"`

Print the resulting dictionary. Sample output is provided below.

In [101]:
words_dict = dict()

remove_chars = "&$@[].,'#()-\"!?’_{};"

# Write code below
# use as many lines as needed but within this same cell only.
# -----------------------------------------------------------
for i in remove_chars:
    tweet_texts = str(tweet_texts).replace(i,"")
    tweet_texts = tweet_texts.upper()
    words = tweet_texts.split()

for j in words:
    words_dict[j] = words_dict.get(j,0)+1 

# -----------------------------------------------------------
# Dump dictionary contents here        
print(words_dict)

{'SERAHPHYMN': 1, 'TOTALLY': 2, 'A': 6, '5': 2, '🥺': 1, 'IF': 3, 'MY': 8, 'SONA': 61, 'MET': 1, 'HIM': 3, 'THEY': 4, 'WOULD': 3, 'VIBE': 1, 'RT': 53, 'CNNPHILIPPINES:': 2, 'THREAD:': 1, 'DUTERTE': 8, 'PUSHED': 1, 'FOR': 11, 'THE': 55, 'REIMPOSITION': 1, 'OF': 30, 'CAPITAL': 1, 'PUNISHMENT': 1, 'SPECIFYING': 1, 'METHOD': 1, 'LETHAL': 1, 'INJECTION': 1, 'HEINO…': 1, 'CHAIHARVEST': 1, 'OPEN': 1, 'YOUR': 2, 'PURSE': 1, 'SHOW': 1, 'ME': 3, 'DIFFICULT': 1, 'STEP': 2, '\\NMERE': 1, 'LIYE': 1, 'KAR': 2, 'LEGA': 1, 'ROHITME': 1, 'LUNGA': 1, 'KO': 4, 'PASAND': 1, 'HE': 2, '🙂\\N\\NKAHAANHUMKAHAANTUM': 1, 'RONAKSHI': 1, '\\N': 1, 'KARPIKA': 1, 'KARANVGROVER': 1, 'DIPIKAKAKAR': 1, '\\NTUM': 1, 'JAO': 1, 'GHAR': 1, 'PE': 1, 'MUJHE': 1, 'SIKHA': 1, 'DEGI': 1, '😂😂😂': 1, '\\NMUJHE': 1, 'MAZA': 1, 'AAYA': 1, 'THA': 3, '😂😂': 1, 'HTTPS://TCO/AQRFUUJS0I': 1, 'NEKOSONA': 1, 'HTTPS://TCO/HKJZWDY1X5': 1, 'INQUIRERDOTNET:': 1, 'LOOK:': 2, 'GROUPS': 2, 'FROM': 3, 'DIFFERENT': 1, 'SECTORS': 1, 'GATHER': 1, 'AT':

#### 2.2.2.

Now, we would like to have a list of words and counts sorted by count in descending order.

Define a new list variable named `words_list` containing **tuples**, where each tuple is as follows:

*(word, count)*

Dump the contents of `words_list`(in script mode). 

Sample output is provided below.

In [102]:
# write code here
words_list = [(i, v) for i, v in words_dict.items()] 

# dump word_list here, assuming variable name is words_list
print(words_list)



[('SERAHPHYMN', 1), ('TOTALLY', 2), ('A', 6), ('5', 2), ('🥺', 1), ('IF', 3), ('MY', 8), ('SONA', 61), ('MET', 1), ('HIM', 3), ('THEY', 4), ('WOULD', 3), ('VIBE', 1), ('RT', 53), ('CNNPHILIPPINES:', 2), ('THREAD:', 1), ('DUTERTE', 8), ('PUSHED', 1), ('FOR', 11), ('THE', 55), ('REIMPOSITION', 1), ('OF', 30), ('CAPITAL', 1), ('PUNISHMENT', 1), ('SPECIFYING', 1), ('METHOD', 1), ('LETHAL', 1), ('INJECTION', 1), ('HEINO…', 1), ('CHAIHARVEST', 1), ('OPEN', 1), ('YOUR', 2), ('PURSE', 1), ('SHOW', 1), ('ME', 3), ('DIFFICULT', 1), ('STEP', 2), ('\\NMERE', 1), ('LIYE', 1), ('KAR', 2), ('LEGA', 1), ('ROHITME', 1), ('LUNGA', 1), ('KO', 4), ('PASAND', 1), ('HE', 2), ('🙂\\N\\NKAHAANHUMKAHAANTUM', 1), ('RONAKSHI', 1), ('\\N', 1), ('KARPIKA', 1), ('KARANVGROVER', 1), ('DIPIKAKAKAR', 1), ('\\NTUM', 1), ('JAO', 1), ('GHAR', 1), ('PE', 1), ('MUJHE', 1), ('SIKHA', 1), ('DEGI', 1), ('😂😂😂', 1), ('\\NMUJHE', 1), ('MAZA', 1), ('AAYA', 1), ('THA', 3), ('😂😂', 1), ('HTTPS://TCO/AQRFUUJS0I', 1), ('NEKOSONA', 1), (

#### 2.2.3

Sort the list by count (which is the second element of each tuple) in **descending** (or reverse) order.

Dump the contents of the newly-sorted list `words_list`. 

Sample output shown below.

Hint: this can be done using `lambda` but you can use a regular function definition. Make sure you go through the mini-tutorial at the start of Problem Set 2.

In [103]:
# write code here
words_list.sort(key=lambda x: x[1], reverse=True)



# dump contents

print(words_list)

[('SONA', 61), ('THE', 55), ('RT', 53), ('TO', 34), ('SA', 34), ('OF', 30), ('ANG', 20), ('NA', 19), ('NG', 14), ('5TH', 13), ('AND', 13), ('MGA', 12), ('FOR', 11), ('IN', 11), ('DUTERTES', 10), ('WE', 9), ('MY', 8), ('DUTERTE', 8), ('WITH', 8), ('THIS', 7), ('COVID19', 7), ('NI', 7), ('2020', 7), ('THAT', 7), ('A', 6), ('MAY', 6), ('I', 6), ('AMP', 6), ('UP', 5), ('ON', 5), ('PA', 5), ('HIS', 5), ('TAG', 5), ('IS', 5), ('THEY', 4), ('KO', 4), ('RODRIGO', 4), ('ROA', 4), ('YOU', 4), ('FIRST', 4), ('SI', 4), ('PRES', 4), ('DI', 4), ('KA', 4), ('RIN', 4), ('YUNG', 4), ('YA', 4), ('STUDENT', 4), ('WAY', 4), ('IF', 3), ('HIM', 3), ('WOULD', 3), ('ME', 3), ('THA', 3), ('FROM', 3), ('BEFORE', 3), ('ITS', 3), ('PRESIDENT', 3), ('CONGRESS', 3), ('ONE', 3), ('OWN', 3), ('GLOBE', 3), ('LANG', 3), ('ARAW', 3), ('PRESS', 3), ('NAMAN', 3), ('RAPPLERDOTCOM:', 3), ('EVERYONE', 3), ('STATE', 3), ('DURING', 3), ('COUNCIL', 3), ('SILA', 3), ('KHARAASONA', 3), ('DAPAT', 3), ('TOTALLY', 2), ('5', 2), ('CN

#### 2.2.4.

Print out the top 5 words (based on count).

Take note of the formatting below (i.e. one line per print output).

Sample output shown below, with the index shown as the leftmost element (the integer starting with 1).

Hint: No need to create a special index variable and manually increment. You may use one of the many `for` loop constructs for automatic index variable generation. You don't have to, but it's easier.

In [104]:
# write code here
for i in range (0,5):
    print (i+1, words_list[i])

1 ('SONA', 61)
2 ('THE', 55)
3 ('RT', 53)
4 ('TO', 34)
5 ('SA', 34)


#### 2.2.5.

Write a new **csv** file `wordcount.csv` with format like so:

```
word,count
SONA,66
THE,56
RT,54
TO,37
SA,35
...
```

Hint: You may use plain old string concatenation for writing to file, but feel free to experiment with other options.

In [105]:
import csv

# write code here
with open('wordcount.csv', 'w', encoding = 'utf-8') as wordfile:
    # write the rest of your code here
    csvwriter = csv.writer(wordfile)
    field_names = ['word', 'count']
    writer = csv.DictWriter(wordfile,fieldnames = field_names)
    writer.writeheader
    
    for i in words_list:
        csvwriter.writerow(i)

Perform a `diff` or `fc` operation between your output file and the file named `wordcount-test.csv` which you can download from Canvas. Make sure there are no differences.

**IMPORTANT:** Please make sure that you commit `wordcount.csv` together with your Jupyter Notebook in your GitHub repository.
