## Instructions {-}

1. You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity. 

2. Do not write your name on the assignment.

3. Write your code in the *Code* cells of the Jupyter notebook. Ensure that the solution is written neatly enough to understand and grade.

4. Use [Quarto](https://quarto.org/docs/output-formats/html-basics.html) to print the *.ipynb* file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: `quarto render filename.ipynb --to html`. Submit the HTML file.

5. There are 5 points for clealiness and organization. The breakdow is as follows:

- Must be an HTML file rendered using Quarto (1.5 pts).  

- There aren’t excessively long outputs of extraneous information (e.g. no printouts of unnecessary results without good reason, there aren’t long printouts of which iteration a loop is on, there aren’t long sections of commented-out code, etc.) (1 pt)

- There is no piece of unnecessary / redundant code, and no unnecessary / redundant text (1 pt)

- The code should be commented and clearly written with intuitive variable names. For example, use variable names such as number_input, factor, hours, instead of a,b,xyz, etc. (1.5 pts)

6. The assignment is worth 100 points, and is due on **29th April 2023 at 11:59 pm**. 

## GDP of The USA
USA's GDP per capita from 1960 to 2021 is given by the tuple `T` in the code cell below. The values are arranged in ascending order of the year, i.e., the first value is for 1960, the second value is for 1961, and so on. 

In [117]:
T = (3007, 3067, 3244, 3375,3574, 3828, 4146, 4336, 4696, 5032,5234,5609,6094,6726,7226,7801,8592,9453,10565,11674,12575,13976,14434,15544,17121,18237,19071,20039,21417,22857,23889,24342,25419,26387,27695,28691,29968,31459,32854,34515,36330,37134,37998,39490,41725,44123,46302,48050,48570,47195,48651,50066,51784,53291,55124,56763,57867,59915,62805,65095,63028,69288)

### Gaps
Use list comprehension to produce a list of the gaps between consecutive entries in `T`, i.e, the increase in GDP per capita with respect to the previous year. The list with gaps should look like: [60, 177, ...].

*(6 points)*

### Maximum gap size
Use the list developed in C.1.1 to find the maximum gap size, i.e, the maximum increase in GDP per capita.

*(2 points)*

### Gaps higher than \$1000
Using list comprehension with the list developed in C.1.1, find the percentage of gaps that have size greater than $1000.

*(6 points)*

### Dictionary
Create a dictionary `D`, where the `key` is the year, and `value` for the `key` is the increase in GDP per capita in that year with respect to the previous year, i.e., the gaps computed in C.1.1. 

*(6 points)*

### Maximum increase
Use the dictionary `D` to find the year when the GDP per capita increase was the maximum as compared to the previous year. Use the list comprehension method.

*(6 points)*

**Hint:** [...... for .... in D.items() if ......]

### GDP per capita decrease
Use the dictionary `D` to find the years when the GDP per capita decreased with respect to the previous year. Use the list comprehension method.

*(6 points)*

## Ted Talks
### Reading data
Read the file *TED_Talks.json* on ted talks using the code below. You will get the data in the object `TED_Talks_data`. Just look at the data structure of `TED_Talks_data`. You will need to know how the data is structured in lists/dictionaries to answer the questions below.

Note that the data must be stored in the same directory as the notebook.

*(2 points)*

In [97]:
import json
with open("TED_Talks.json", "r") as file:
    TED_Talks_data=json.load(file)

### Number of talks
Find the number of talks in the dataset.

*(2 points)*

### Popular talk
Find the `headline`, `speaker` and `year_filmed` of the talk with the highest number of `views`.

*(6 points)*

### Mean and median views
What are the mean and median number of `views` for a talk? Can we say that the majority of talks (i.e., more than 50% of the talks) have less `views` than the average number of `views` for a talk? Justify your answer.

*(6 points)*

### Views vs average views
Do at least 25% of the talks have more `views` than the average number of `views` for a talk? Justify your answer.

*(4 points)*

### `Confusing` talks
Find the `headline` of the talk that received the highest number of votes in the `Confusing` category.

*(8 points)*

### `Fascinating` talks
Find the `headline` and the `year_filmed` of the talk that received the highest percentage of votes in the *Fascinating* category. 

$$\text{Percentage of } \textit{Fascinating} \text{ votes for a ted talk} = \frac{Number \ of \  votes \ in \ the \ Fascinating \ category \ }{Total \ votes \ in \ all  \ categories}$$

*(10 points)*

## Poker

The object `deck` defined below corresponds to a deck of cards. Estimate the probability that a five card hand will be:

1. Straight

2. Three-of-a-kind 

3. Two-pair

4. One-pair

5. High card

You may check the meaning of the above terms [here](https://en.wikipedia.org/wiki/List_of_poker_hands).

*(25 points)*

**Hint:**

Estimate these probabilities as follows.

1. Write a function that accepts a hand of 5 cards as argument, and returns relevant characterisitics of a hand, such as the number of distinct card values, maximum occurences of a value etc. Using the values returned by this function (may be in a dictionary), you can compute if the hand is of any of the above types *(Straight / Three-of-a-kind / two-pair / one-pair / high card)*. 


2. Randomly pull a hand of 5 cards from the `deck`. Call the function developed in (1) to get the relevant characteristics of the hand. Use those characteristics to determine if the hand is one of the five mentioned types *(Straight / Three-of-a-kind / two-pair / one-pair / high card)*. 


3. Repeat (2) 10,000 times.


4. Estimate the probability of the hand being of the above five mentioned types *(Straight / Three-of-a-kind / two-pair / one-pair / high card)* from the results of the 10,000 simulations.

You may use the function `shuffle()` from the library `random` to shuffle the deck everytime before pulling a hand of 5 cards.

**You don't need to stick to the hint if you feel you have a better way to do it.** In case you have a better way, you can claim 10 bonus points for this assignment.

In [4]:
deck = [{'value':i, 'suit':c}
for c in ['spades', 'clubs', 'hearts', 'diamonds']
for i in range(2,15)]