# Data Analysis with Python

> Exercises: Basic Pandas

Kuo, Yao-Jen <yaojenkuo@datainpoint.com> from [DATAINPOINT](https://www.datainpoint.com)

## Instructions

- We've imported necessary modules/libraries at the beginning of each exercise.
- We've defined the names of functions/inputs/arguments for you.
- Write down your solution between the comments `### BEGIN SOLUTION` and `### END SOLUTION`.
- Running tests to see if your solutions are right: Kernel -> Restart & Run All -> Restart and Run All Cells.
- You can run tests after each question or after finishing all questions.

In [1]:
import unittest
import pandas as pd

## 01. Define a function named `get_olympic_df` that is able to import a csv file as a pandas DataFrame.

- Expected inputs: a CSV file `all_time_olympic_medals.csv`.
- Expected outputs: a (153, 17) DataFrame.

In [2]:
def get_olympic_df(csv_file_path):
    """
    >>> olympic_df = get_olympic_df('all_time_olympic_medals.csv')
    >>> print(type(olympic_df))
    <class 'pandas.core.frame.DataFrame'>
    >>> print(olympic_df.shape)
    (153, 17)
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 02. Define a function named `find_taiwan` that is able to retrieve the data of Taiwan as a pandas DataFrame.

PS Taiwan might not be "Taiwan" in Olympic data.

- Expected inputs: a CSV file `all_time_olympic_medals.csv`.
- Expected outputs: a (1, 17) DataFrame.

In [3]:
def find_taiwan(csv_file_path):
    """
    >>> taiwan = find_taiwan('all_time_olympic_medals.csv')
    >>> print(type(taiwan))
    <class 'pandas.core.frame.DataFrame'>
    >>> print(taiwan.shape)
    (1, 17)
    >>> print(taiwan['team_name'].values[0])
    'Chinese Taipei'
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 03. Define a function named `find_the_king_of_summer_olympics` that is able to retrieve the data of the country that won the most gold medals in summer Olympics.

- Expected inputs: a CSV file `all_time_olympic_medals.csv`.
- Expected outputs: a (1, 17) DataFrame.

In [4]:
def find_the_king_of_summer_olympics(csv_file_path):
    """
    >>> the_king_of_summer_olympics = find_the_king_of_summer_olympics('all_time_olympic_medals.csv')
    >>> print(type(the_king_of_summer_olympics))
    <class 'pandas.core.frame.DataFrame'>
    >>> print(the_king_of_summer_olympics.shape)
    (1, 17)
    >>> print(the_king_of_summer_olympics['no_summer_golds'].values[0])
    1022
    >>> print(the_king_of_summer_olympics['team_name'].values[0])
    'United States'
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 04. Define a function named `find_the_king_of_winter_olympics` that is able to retrieve the data of the country that won the most gold medals in winter Olympics.

- Expected inputs: a CSV file `all_time_olympic_medals.csv`.
- Expected outputs: a (1, 17) DataFrame.

In [5]:
def find_the_king_of_winter_olympics(csv_file_path):
    """
    >>> the_king_of_winter_olympics = find_the_king_of_winter_olympics('all_time_olympic_medals.csv')
    >>> print(type(the_king_of_winter_olympics))
    <class 'pandas.core.frame.DataFrame'>
    >>> print(the_king_of_winter_olympics.shape)
    (1, 17)
    >>> print(the_king_of_winter_olympics['no_winter_golds'].values[0])
    132
    >>> print(the_king_of_winter_olympics['team_name'].values[0])
    'Norway'
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## 05. Define a function named `find_largest_ratio` that is able to retrieve the data of the country that has the largest ratio which is calculated as:

\begin{equation}
\text{Ratio} = \frac{\text{Summer Gold} - \text{Winter Gold}}{\text{Total Gold}}
\end{equation}

- Expected inputs: a CSV file `all_time_olympic_medals.csv`.
- Expected outputs: a Series of size 17.

Note: You have to exclude the countries with ratio calculated as 1.

In [6]:
def find_largest_ratio(csv_file_path):
    """
    >>> largest_ratio = find_largest_ratio('all_time_olympic_medals.csv')
    >>> print(type(largest_ratio))
    <class 'pandas.core.series.Series'>
    >>> print(largest_ratio.size)
    17
    >>> print(largest_ratio['team_name']
    'Hungary'
    """
    ### BEGIN SOLUTION
    
    ### END SOLUTION

## Run tests!

Kernel -> Restart & Run All -> Restart and Run All Cells.

In [7]:
class TestBasicPandas(unittest.TestCase):
    def test_00_get_olympic_df(self):
        olympic_df = get_olympic_df('all_time_olympic_medals.csv')
        self.assertIsInstance(olympic_df, pd.core.frame.DataFrame)
        self.assertEqual(olympic_df.shape, (153, 17))
    def test_01_find_taiwan(self):
        taiwan = find_taiwan('all_time_olympic_medals.csv')
        self.assertIsInstance(taiwan, pd.core.frame.DataFrame)
        self.assertEqual(taiwan.shape, (1, 17))
        self.assertEqual(taiwan['team_name'].values[0], 'Chinese Taipei')
    def test_02_find_the_king_of_summer_olympics(self):
        the_king_of_summer_olympics = find_the_king_of_summer_olympics('all_time_olympic_medals.csv')
        self.assertIsInstance(the_king_of_summer_olympics, pd.core.frame.DataFrame)
        self.assertEqual(the_king_of_summer_olympics.shape, (1, 17))
        self.assertEqual(the_king_of_summer_olympics['no_summer_golds'].values[0], 1022)
        self.assertEqual(the_king_of_summer_olympics['team_name'].values[0], 'United States')
    def test_03_find_the_king_of_winter_olympics(self):
        the_king_of_winter_olympics = find_the_king_of_winter_olympics('all_time_olympic_medals.csv')
        self.assertIsInstance(the_king_of_winter_olympics, pd.core.frame.DataFrame)
        self.assertEqual(the_king_of_winter_olympics.shape, (1, 17))
        self.assertEqual(the_king_of_winter_olympics['no_winter_golds'].values[0], 132)
        self.assertEqual(the_king_of_winter_olympics['team_name'].values[0], 'Norway')
    def test_04_find_largest_ratio(self):
        largest_ratio = find_largest_ratio('all_time_olympic_medals.csv')
        self.assertIsInstance(largest_ratio, pd.core.series.Series)
        self.assertEqual(largest_ratio.size, 17)
        self.assertEqual(largest_ratio['team_name'], 'Hungary')

suite = unittest.TestLoader().loadTestsFromTestCase(TestBasicPandas)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)

test_00_get_olympic_df (__main__.TestBasicPandas) ... FAIL
test_01_find_taiwan (__main__.TestBasicPandas) ... FAIL
test_02_find_the_king_of_summer_olympics (__main__.TestBasicPandas) ... FAIL
test_03_find_the_king_of_winter_olympics (__main__.TestBasicPandas) ... FAIL
test_04_find_largest_ratio (__main__.TestBasicPandas) ... FAIL

FAIL: test_00_get_olympic_df (__main__.TestBasicPandas)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-7-bf7c72d137fc>", line 4, in test_00_get_olympic_df
    self.assertIsInstance(olympic_df, pd.core.frame.DataFrame)
AssertionError: None is not an instance of <class 'pandas.core.frame.DataFrame'>

FAIL: test_01_find_taiwan (__main__.TestBasicPandas)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython-input-7-bf7c72d137fc>", line 8, in test_01_find_taiwan
    self.assertIsInstance(taiwan, pd.core.frame.DataFr

In [8]:
print("You've got {} successes among {} questions.".format(number_of_successes, number_of_test_runs))

You've got 0 successes among 5 questions.
