# Objective
Dive deep into working with datasets using _pandas_.

# Things To Learn
* Creating and working with _pandas dataframes_ and _series_.
* Loading and getting an overview of data.
* Manipulating datasets.
* Selecting, grouping and sorting data.

# Submission Guidelines
* Your finished _Jupyter Notebook_ - both as `.ipynb` and exported `.pdf`.

# Task: Manually Creating A _Dataframe_ (Fabian Oppermann)

**Task:**
Import _pandas_ and manually create a _dataframe_ containing data about a domain **you're passionate about**. Your dataframe should...

* ...contain at least 4 rows.
* ...contain at least 3 column, of which at least one should be numeric and one should be suitable as a _key_ (i.e. _index_).

Next, manually create a _series_ and add it as additional column to your dataset.

Finally, set the index of your dataframe to a suitable column.

In [3]:
import pandas as pd

# df = pd.read_csv('')

# TODO

# Task: Importing And Getting An Overview (Fabian Oppermann)

Read the _ramen_ ratings from the provided file and get an overview of the data:

* Take a look at some rows from the top, bottom or random positions...
* Use summary functions to get a statistical overview of the data.
* Find out what data types we're dealing with.
* Identify what column would be suitable as index and set it!

In [None]:
ramen_df = pd.read_csv('./ramen-ratings.csv')

print(ramen_df.head()) # Top
print('-' * 50)
print(ramen_df.tail()) # Bottom
print('-' * 50)
print(ramen_df.sample(5)) # Random
print('-' * 50)
print(ramen_df.describe()) # Summary
print('-' * 50)
print(ramen_df.info()) # Info

   Review #           Brand  \
0      2580       New Touch   
1      2579        Just Way   
2      2578          Nissin   
3      2577         Wei Lih   
4      2576  Ching's Secret   

                                             Variety Style Country Stars  \
0                          T's Restaurant Tantanmen    Cup   Japan  3.75   
1  Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...  Pack  Taiwan     1   
2                      Cup Noodles Chicken Vegetable   Cup     USA  2.25   
3                      GGE Ramen Snack Tomato Flavor  Pack  Taiwan  2.75   
4                                    Singapore Curry  Pack   India  3.75   

  Top Ten  
0     NaN  
1     NaN  
2     NaN  
3     NaN  
4     NaN  
--------------------------------------------------
      Review #     Brand                                            Variety  \
2575         5     Vifon  Hu Tiu Nam Vang ["Phnom Penh" style] Asian Sty...   
2576         4   Wai Wai                     Oriental Style Instant Noodles

# Task: Dealing With Erroneous Rows

As you surely noticed, the star ratings are not included in the numerical statistics. Find out why and fix this, so we can work with the column:

* Take a look at the column's data type (should have been determined in the last task). Try casting it to `float`!
* Find out all the unique values in the column.
* Now that you know the faulty values, identify their rows and drop them. This should affect 3 rows.
* Try casting the the column again and make sure we can now calculate statistics on it by printing the mean of it.

In [None]:
# Converting 'Stars' into int
ramen_df = ramen_df.astype({'Stars': float})

# Unique values
print(ramen_df['Stars'].unique())

# Task: Missing Values (Fabian Oppermann)

We don't want to deal with missing values in this example. Do the following:

* Find out how many values are missing in each column.
* One column should be missing 2 values. Identify the relevant rows and delete them.
* One column should be missing a lot of values. Delete the whole column.

In [None]:
# Missing values
print(ramen_df.isnull().sum())

ramen_df.isnull().sum(axis=0) > 2

Review #       0
Brand          0
Variety        0
Style          2
Country        0
Stars          0
Top Ten     2539
dtype: int64


# Task: Create A Price-Column (Fabian Oppermann)

Unfortunately the ramen reviews don't contain a price column. Let's _fake_ one with rough estimates:

* A _pack_ of ramen should cost 0.79.
* A _bowl_ of ramen should cost 1.79.
* A _cup_ of ramen should cost 1.29.
* A _tray_ of ramen should cost 2.19.
* All other types should cost 1.09.

Create a function calculating the price based on the style. Use `map` to create a series and add it as a new column to your dataframe.

In [11]:
COST_PACK = 0.79
COST_BOWL = 1.79
COST_CUP = 0.99
COST_TRAY = 2.19
COST_OTHER = 1.09


def calculate_price(style):
    if style == 'Pack':
        return 0.79
    elif style == 'Bowl':
        return 1.79
    elif style == 'Cup':
        return 1.29
    elif style == 'Tray':
        return 2.19
    else:
        return 1.09


ramen_df['Price'] = ramen_df['Style'].map(calculate_price)

print(ramen_df.head())

   Review #           Brand  \
0      2580       New Touch   
1      2579        Just Way   
2      2578          Nissin   
3      2577         Wei Lih   
4      2576  Ching's Secret   

                                             Variety Style Country Stars  \
0                          T's Restaurant Tantanmen    Cup   Japan  3.75   
1  Noodles Spicy Hot Sesame Spicy Hot Sesame Guan...  Pack  Taiwan     1   
2                      Cup Noodles Chicken Vegetable   Cup     USA  2.25   
3                      GGE Ramen Snack Tomato Flavor  Pack  Taiwan  2.75   
4                                    Singapore Curry  Pack   India  3.75   

  Top Ten  Price  
0     NaN   1.29  
1     NaN   0.79  
2     NaN   1.29  
3     NaN   0.79  
4     NaN   0.79  


# Task: From Stars To Points (Fabian Oppermann)

Let's switch from a _star rating_ to a _point rating_ between 1 and 100.

* Calculate the points based on the stars, where 5 stars equal 100 points.
* Change the column name from _Stars_ to _Points_.

# Task: Create A Recommendation-Column

Let's create a new column containing a textual recommendation:

* Ramen with points higher or equal than 90 points get either _Amazing value!_ (for prices benath 1.3) or _Expensive but delicious!_.
* Other ramen with ratings above 80 should read _Must-Try!_.
* Other cheap ramen with a price below 1 should read _Budget choice!_.
* All other ramen should read _Why not?_.

Since you need multiple columns to calculate the recommendation, you need to use `apply`. Check how often each recommendation text appears afterwards!

# Task: Export Data

At this point it makes sense to backup our processed dataframe. Export the data into a file `ramen_processed.csv` into a subfolder `output` and compare your results to the provided solution.

# Task: Selecting Data (_Integer-Based_)

You can now either use the dataset you've worked on so far or - if in doubt - load our `ramen_processed_solution.csv` for the next tasks.

Solve the following tasks to sharpen your selecting skills:

* Get the first two columns of the last ten rows.
* Get the 4<sup>th</sup> column of the 15<sup>th</sup> row.
* Get the second and the last column of the second last ten rows.
* Get everything but the first column of the 20<sup>th</sup>, 30<sup>th</sup>, 40<sup>th</sup> and 50</sup>th</sup> row.
* Get every column of the 100<sup>th</sup> up to (and including) the 200<sup>th</sup> row.
* Get the third column of every row.

# Task: Selecting Data (_Label-Based_)

Solve the following tasks to sharpen your selecting skills:

* Get the variety and textual recommendation for the review _#1235_.
* Get all columns but the textual recommendation for the reviews from (and including) _#5_ to (and including) _#10_.
* Get the style and points for the reviews _#123_, _#234_ and _#345_.
* Get the variety, style and country for all reviews.

# Task: Conditional Selection

Solve the following tasks to sharpen your selection skills:

* Get everything about ramen that comes in a bowl.
* Get variety, style and points of all ramen that comes from Germany.
* Get the columns from brand up to country of all ramen that comes in a cup and has a rating lower than 10 points.
* Get brand, variety and points of all ramen either produced by _Samyang_ or having a rating over 95 points.
* Get everything but price and recommendation for all ramen containing _Hello Kitty_ in their variety field.
* Get everything up to country for all ramen from the brands _Knorr_, _Vifon_ and _Yum Yum_.

# Task: Grouping And Sorting

Use grouping to solve the following tasks:

* Print out the number of reviews per brand in ascending order.
* Print out all the styles and their mean rating points - in descending order.
* Print out the minimum, maximum and mean price per brand, sorted by the maximum values.
* Print out all the values of the highest rated ramen per style!
* Print out the count and mean of ratings per style per brand, sorted by the count.

# Good Job!