# Assignment - Statistics and Probability Practice

As you go through this notebook, you will find the symbol **???** in certain places. To complete this assignment, you must replace all the **???** with appropriate values, expressions, or statements to ensure that the notebook runs properly end-to-end. 

**Guidelines**

1. Make sure to run all the code cells in order. Otherwise, you may get errors like `NameError` for undefined variables.
2. Do not change variable names, delete cells, or disturb other existing code. It may cause problems during evaluation.
3. In some cases, you may need to add some code cells or new statements before or after the line of code containing the **???**. 
4. Since you'll be using a temporary online service for code execution, save your work by running `jovian.commit` at regular intervals.
5. Questions marked **(Optional)** will not be considered for evaluation and can be skipped. They are for your learning.
6. If you are stuck, you can ask for help on the bootcamp Slack group. Post errors, ask for hints, and help others, but **please don't share the complete solution code on Slack** to give others a chance to write the code themselves.
7. There are some tests included with this notebook to help you test your implementation. However, after submission, your code will be tested with some hidden test cases. Make sure to test your code exhaustively to cover all edge cases.

![image](https://i.imgur.com/v5GXvxy.jpg) 

### How to Run the Code and Save Your Work

**Option 1: Running using free online resources (1-click, recommended)**: Click the **Run** button at the top of this page and select **Run on Binder**. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on [Google Colab](https://colab.research.google.com) or [Kaggle](https://kaggle.com) to use these platforms.


**Option 2: Running on your computer locally**: To run the code on your computer locally, you'll need to set up [Python](https://www.python.org) & [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/), download the notebook and install the required libraries. Click the **Run** button at the top of this page, select the **Run Locally** option, and follow the instructions.

**Saving your work**: You can save a snapshot of the assignment to your [Jovian](https://jovian.ai) profile, so that you can access it later and continue your work. Keep saving your work by running `jovian.commit` from time to time.

In [1]:
project_name='statistics-probability-practice-assignment'

In [None]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [3]:
jovian.commit(project=project_name, privacy='secret')

<IPython.core.display.Javascript object>

[jovian] Creating a new project "rakesh-rajagopalachary/statistics-probability-practice-assignment"[0m
[jovian] Committed successfully! https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment[0m


'https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment'

## Helper Functions

You may find the following helper functions useful in one or more questions.

In [4]:
def probability(matching_outcomes, total_outcomes):
    """Compute probability of an event when all outcomes are equally likely"""
    return matching_outcomes / total_outcomes

def union_probability(p_a, p_b, p_intersection):
    """Compute the probability of P(A or B) given P(A), P(B) and P(A and B)"""
    return p_a + p_b - p_intersection

def estimate_probability(matching_experiments, total_experiments):
    """Estimate the probability of an event by conducting many experiments"""
    return matching_experiments / total_experiments

def quartiles(nums):
    """Returns the quartiles for the given set of numbers"""
    return np.percentile(nums, 25), np.median(nums), np.percentile(nums, 75)

def datarange(nums):
    """Returns the minimum and maximum number in a set of numbers"""
    return min(nums), max(nums)

def count_occurrences(elements):
    """Returns a dictionary containing the no. of occurrence for each unique element"""
    # Create a dictionary of results
    counts = {}
    # Go over each element in the list
    for element in elements:
        # Check we already have an entry for it
        if element in counts:
            # Increment the count
            counts[element] += 1
        else:
            # If not present already, create an entry
            counts[element] = 1
    # Return the dictionary of results
    return counts

def mode(elements):
    """Returns a list containing the most frequently occuring element(s)"""
    # Count the no. of occurences of each value
    counts = count_occurrences(elements)
    # Get the maximum no. of occurences of any value
    max_count = max(counts.values())
    # Make a list of the matching elements
    results = []
    # Iterate over unique elements
    for element in counts:
        # Check if its count matches max_count
        if counts[element] == max_count:
            # Add it to results
            results.append(element)
    return results

In [5]:
# Uncomment if you're unable to import pandas or numpy
# !pip install pandas numpy --upgrade --quiet

In [6]:
import pandas as pd
import numpy as np

## Problems

Replace each occurrence **???** with your answer. You can add new code cells if necessary. Optional problems can be skipped.

> **QUESTION 1**: A coin is tossed 10,000 times and results in a head 3,490 times. Estimate the probability of getting a tail when the coin is tossed. The variable `p_tail` should contain your answer.

In [15]:
10000-3490

6510

In [16]:
p_tail =  probability(6510, 10000)

In [17]:
p_tail

0.651

(Optional) If the same coin is tossed another 5,000 times, how many can you expect to get a tail?

In [18]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "rakesh-rajagopalachary/statistics-probability-practice-assignment" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment[0m


'https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment'

> **QUESTION 2**: The participants of an online course include 8 men from India, 9 women from India, 5 men from the USA, and 7 women from the USA. If a participant is picked at random, what is the probability that the participant is American or a woman? The variable `p_american_or_woman` should contain your answer.
>
> *Hint*: Identify the events "$A$", "$B$", "$A \textrm{ and } B$" and "$A \textrm{ or } B$", then use the `union_probability` function.

In [19]:
INDIAN_MEN = 8
INDIAN_WOMEN=9
USA_MEN=5
USA_WOMEN=7
TOTAL= INDIAN_MEN+INDIAN_WOMEN+USA_MEN+USA_WOMEN
TOTAL

29

In [20]:
p_american_or_woman = (TOTAL-USA_WOMEN)/TOTAL

In [21]:
p_american_or_woman

0.7586206896551724

In [22]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "rakesh-rajagopalachary/statistics-probability-practice-assignment" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment[0m


'https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment'

> **QUESTION 3**: Download [this CSV file](https://gist.githubusercontent.com/aakashns/ee78201d1de1bd8f88cc4b03868b664b/raw/5f2e3baa1d4ab72130bf5dfb46c2f5f4d3c3ff74/us_population_density.csv) containing the population density (average number of persons per square km) and land area (square km) for various states and territories within the United States. The first few rows of the table are shown below. Find the total population density of the united states. The variable `us_population_density` should contain your answer.
>
> <img src="https://i.imgur.com/7rOOMZg.png" width="480">
> 
> *Hint*: The Population density of a region is obtained by dividing the total population of the area with its land area. You may find the functions `np.multiply` and `np.sum` useful.

In [23]:
population_density_csv = 'https://gist.githubusercontent.com/aakashns/ee78201d1de1bd8f88cc4b03868b664b/raw/5f2e3baa1d4ab72130bf5dfb46c2f5f4d3c3ff74/us_population_density.csv'
density_df = pd.read_csv(population_density_csv)

In [24]:
density_df

Unnamed: 0,state_or_territory,population_density_per_sq_km,land_area_sq_km
0,District of Columbia,4251.0,158.0
1,New Jersey,470.0,19046.8
2,Puerto Rico,404.0,9103.8
3,Rhode Island,394.0,2678.0
4,Massachusetts,336.0,20201.9
5,Guam,314.0,543.9
6,US Virgin Islands,308.0,347.1
7,Connecticut,286.0,12540.7
8,American Samoa,279.0,199.4
9,Maryland,238.0,25141.0


In [50]:
def population_density(population_density_per_sq_km, land_area_sq_km):
    population1 = (population_density_per_sq_km*land_area_sq_km)
    population = population1.sum()
    land_area_sq = land_area_sq_km.sum()
    density = (population/land_area_sq)  
    return density

In [51]:
population_density_per_sq_km = density_df['population_density_per_sq_km']
land_area_sq_km = density_df['land_area_sq_km']

In [52]:
us_population_density = population_density(population_density_per_sq_km, land_area_sq_km)

In [53]:
us_population_density

35.19090448744763

In [54]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "rakesh-rajagopalachary/statistics-probability-practice-assignment" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment[0m


'https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment'

> **QUESTION 4**: Download this [CSV file](https://gist.githubusercontent.com/aakashns/f7d8f99c391f0727270e27e157460e3a/raw/2128978f3297f39ca4237a9ff843c80dd44ca4e3/stocks_returns.csv) containing information about returns on stock price for several public companies. Find the median, quartiles and range for the data. The variables `stocks_median`, `stocks_quartiles` and `stocks_range` should contain your answers.
>

In [55]:
stocks_csv = 'https://gist.githubusercontent.com/aakashns/f7d8f99c391f0727270e27e157460e3a/raw/2128978f3297f39ca4237a9ff843c80dd44ca4e3/stocks_returns.csv'
stocks_df = pd.read_csv(stocks_csv)

In [56]:
stocks_df

Unnamed: 0,symbol,returns
0,CMCSA,10.19
1,KMI,3.76
2,INTC,6.65
3,MU,25.67
4,GE,-22.70
...,...,...
3504,XBIO,-20.66
3505,XBIT,20.23
3506,XELB,7.26
3507,XTLB,-35.06


In [84]:
length = len(stocks_df['returns'])
med = length/2
stock_df = stocks_df.sort_values(by=['returns']) 
t1 = stock_df['returns'].loc[1753]
t2 = stock_df['returns'].loc[1754]
median = ((t1)+(t2)/2)
median

207.45

In [85]:
import statistics

In [86]:
stocks_median = statistics.median(stocks_df['returns'])
stocks_median

2.47

In [87]:
stocks_quartiles = quartiles(stocks_df['returns'])

In [88]:
stocks_range = datarange(stocks_df['returns'])

In [89]:
stocks_median, stocks_quartiles, stocks_range

(2.47, (-2.58, 2.47, 9.01), (-88.74, 338.36))

In [90]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "rakesh-rajagopalachary/statistics-probability-practice-assignment" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment[0m


'https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment'

> **QUESTION 5**: Your friend Alex designs shoes as a hobby, and you've persuaded her to start selling her shoes online. You've found a vendor who can manufacture, store and ship the shoes for you. You've requested some sample units from the vendor to inspect the quality of the shoes. However, the vendor does not manufacture fewer than 100 units of single shoe size. Which shoe size would you like them to manufacture? The variable `shoe_size` should contain your answer.
>
> You may find this [CSV file](https://gist.githubusercontent.com/aakashns/f4655dd2a33c176aa60874dafe838260/raw/52915d2edbcfe7a2e961d10745510de9aa78d09e/shoes.csv) containing a list of shoes recently sold by an online store useful.



In [91]:
import pandas as pd

shoes_url = 'https://gist.githubusercontent.com/aakashns/f4655dd2a33c176aa60874dafe838260/raw/52915d2edbcfe7a2e961d10745510de9aa78d09e/shoes.csv'
shoes_df = pd.read_csv(shoes_url)
shoes_df

Unnamed: 0,company,price,size
0,Puma,"₹2,969",9
1,Puma,"₹2,279",8
2,Deals4you,₹397,7
3,MILESWALKER,₹379,11
4,Chevit,₹389,7
...,...,...,...
1035,Wika,₹449,4
1036,Hotstyle,₹449,10
1037,Bond Street By Red Tape,"₹1,048",7
1038,Shoes Kingdom,₹719,6


In [92]:
shoe_size = mode(shoes_df['size'])

In [93]:
shoe_size

[9]

In [94]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Updating notebook "rakesh-rajagopalachary/statistics-probability-practice-assignment" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment[0m


'https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment'

## Make a Submission

Run the following code cell to make a submission. Alternatively, you can also submit your Jovian notebook link on the assignment page. You can make any number of submissions. 



In [95]:
jovian.submit('dsmlbootcamp-stats')

<IPython.core.display.Javascript object>

[jovian] Updating notebook "rakesh-rajagopalachary/statistics-probability-practice-assignment" on https://jovian.ai/[0m
[jovian] Committed successfully! https://jovian.ai/rakesh-rajagopalachary/statistics-probability-practice-assignment[0m
[jovian] Submitting assignment..[0m
[jovian] Verify your submission at https://jovian.ai/learn/zero-to-data-analyst-bootcamp/assignment/statistics-probability-practice[0m


The rest of this assignment is optional, and can be skipped.

> **(OPTIONAL) QUESTION 6**: The table below shows the total no. of goals scored in the FIFA Soccer World Cup 2018 by teams participating in the tournament. 12 teams scored a total of 2 goals each in the tournament, 4 teams scored a total of 3 goals each, 1 team scored a total of 4 goals, 2 teams scored a total of 6 goals each and so on. 
>
> <img src="https://i.imgur.com/4DwHIam.png" width="240">
>
> Answer the following questions using the above data:
>
> 1. Find the total number of goals scored in the tournament.
> 2. Find the total number of teams in the tournament.
> 3. Find the average number of goals scored by a team in the tournament.
> 4. Find the median number of goals scored by a team in the tournament.
> 5. Find the range and quartiles for the number of goals scored by a team in the tournament.
> 6. Visualize the range and quartiles of the number of goals using a box plot.
> 7. Find the mode of the number of goals scored by a team in the tournament.
> 8. What is the maximum number of goals scored by a team in the tournament?
> 9. What is the minimum number of goals scored by a team in the tournament?
> 10. Find the standard deviation of the number of goals scored by a team in the tournament?
> 11. If you randomly pick one of the teams who participated in the tournament at random, what is the probability that team has scored less than three goals in the tournament?
> 
> 12. Find the average number of goals scored per match in the tournament? (you may need additional information)
> 13. If you randomly pick one of the goals scored in the tournament, what is the probability that the goal was scored by Belgium?
>
> The table is also available as a CSV file [here](https://gist.githubusercontent.com/aakashns/80896b90166ac9e81fb3e11f15ba3dd3/raw/95f8d847a82f46566cd45fbd7a72b046e2b52a5c/gistfile1.txt).



In [None]:
goals_url = 'https://gist.githubusercontent.com/aakashns/80896b90166ac9e81fb3e11f15ba3dd3/raw/95f8d847a82f46566cd45fbd7a72b046e2b52a5c/gistfile1.txt'

In [None]:
jovian.commit()