# Roller Coaster

#### Overview

This project is slightly different than others you have encountered thus far. Instead of a step-by-step tutorial, this project contains a series of open-ended requirements which describe the project you'll be building. There are many possible ways to correctly fulfill these requirements, and you should expect to use the internet, Codecademy, and other resources when you encounter a problem that you cannot easily solve.

#### Project Goals

You will work to create several data visualizations that will give you insight into the world of roller coasters.

## Prerequisites

In order to complete this project, you should have completed the first two lessons in the [Data Analysis with Pandas Course](https://www.codecademy.com/learn/data-processing-pandas) and the first two lessons in the [Data Visualization in Python course](https://www.codecademy.com/learn/data-visualization-python). This content is also covered in the [Data Scientist Career Path](https://www.codecademy.com/learn/paths/data-science/).

## Project Requirements

1. Roller coasters are thrilling amusement park rides designed to make you squeal and scream! They take you up high, drop you to the ground quickly, and sometimes even spin you upside down before returning to a stop. Today you will be taking control back from the roller coasters and visualizing data covering international roller coaster rankings and roller coaster statistics.

   Roller coasters are often split into two main categories based on their construction material: **wood** or **steel**. Rankings for the best wood and steel roller coasters from the 2013 to 2018 [Golden Ticket Awards](http://goldenticketawards.com) are provded in `'Golden_Ticket_Award_Winners_Wood.csv'` and `'Golden_Ticket_Award_Winners_Steel.csv'`, respectively. Load each csv into a DataFrame and inspect it to gain familiarity with the data.

In [1]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# load rankings data
wood = pd.read_csv('Golden_Ticket_Award_Winners_Wood.csv')
print(wood.describe())
# load rankings data
steel = pd.read_csv('Golden_Ticket_Award_Winners_Steel.csv')
print(steel.head())
print(steel.describe())

2. Write a function that will plot the ranking of a given roller coaster over time as a line. Your function should take a roller coaster's name and a ranking DataFrame as arguments. Make sure to include informative labels that describe your visualization.

   Call your function with `"El Toro"` as the roller coaster name and the wood ranking DataFrame. What issue do you notice? Update your function with an additional argument to alleviate the problem, and retest your function.

In [2]:
# Create a function to plot rankings over time for 1 roller coaster
def rank_year (name, park):
  dfwood = wood[(wood['Name'] == name) & (wood['Park'] == park)]
  plt.plot(dfwood['Year of Rank'], dfwood['Rank'],)
  plt.ylabel('Rank')
  plt.xlabel('Year')
  plt.legend([name], loc = 1)
  plt.show()

print(rank_year('El Toro', 'Six Flags Great Adventure'))

3. Write a function that will plot the ranking of two given roller coasters over time as lines. Your function should take both roller coasters' names and a ranking DataFrame as arguments. Make sure to include informative labels that describe your visualization.

   Call your function with `"El Toro"` as one roller coaster name, `"Boulder Dash"` as the other roller coaster name, and the wood ranking DataFrame. What issue do you notice? Update your function with two additional arguments to alleviate the problem, and retest your function.

In [2]:
# Create a function to plot top n rankings over time
def rank_year2 (name1, name2, park1, park2):
  dfwood1 = wood[(wood['Name'] == name1) & (wood['Park'] == park1)]
  dfwood2 = wood[(wood['Name'] == name2) & (wood['Park'] == park2)]
  ay= plt.subplot()
  plt.plot(dfwood1['Year of Rank'], dfwood1['Rank'])
  plt.plot(dfwood2['Year of Rank'], dfwood2['Rank'])
  plt.ylabel('Rank')
  plt.xlabel('Year')
  plt.legend([name1, name2], loc = 1)
  ay.set_yticks([1, 2, 3, 4])
  plt.show()

print(rank_year2('El Toro', 'Boulder Dash', 'Six Flags Great Adventure', 'Lake Compounce'))

4. Write a function that will plot the ranking of the top `n` ranked roller coasters over time as lines. Your function should take a number `n` and a ranking DataFrame as arguments. Make sure to include informative labels that describe your visualization.

   For example, if `n == 5`, your function should plot a line for each roller coaster that has a rank of `5` or lower.
   
   Call your function with a value of `n` and either the wood ranking or steel ranking DataFrame.

In [4]:
#Create a plot of top n rankings over time
def top_ranking(df,n):
  top = df[df['Rank'] <= n]
  fig, ax = plt.subplots(figsize=(10,10))
  for coaster in set(top['Name']):
    coaster_rankings = top[top['Name'] == coaster]
    ax.plot(coaster_rankings['Year of Rank'],coaster_rankings['Rank'],label=coaster)
    ax.set_yticks([i for i in range(1,6)])
    plt.title('Top 10 Rankings')
    plt.xlabel('Year')
    plt.ylabel('Ranking')
    plt.legend(loc=4)
    plt.show()

print(top_ranking(wood,5))

5. Now that you've visualized rankings over time, let's dive into the actual statistics of roller coasters themselves. [Captain Coaster](https://captaincoaster.com/en/) is a popular site for recording roller coaster information. Data on all roller coasters documented on Captain Coaster has been accessed through its API and stored in `roller_coasters.csv`. Load the data from the csv into a DataFrame and inspect it to gain familiarity with the data.

In [5]:
# 5
# load roller coaster data
coasters = pd.read_csv('roller_coasters.csv')

6. Write a function that plots a histogram of any numeric column of the roller coaster DataFrame. Your function should take a DataFrame and a column name for which a histogram should be constructed as arguments. Make sure to include informative labels that describe your visualization.

   Call your function with the roller coaster DataFrame and one of the column names.

In [6]:
def hist_roller(df, column):
  plt.hist(df[column], range=(0, 100))
  legend = [column]
  plt.legend(legend)
  plt.xlabel(column)
  plt.ylabel('Number of Roller Coasters')
  plt.show()
print(hist_roller(coasters, 'speed'))

7. Write a function that creates a bar chart showing the number of inversions for each roller coaster at an amusement park. Your function should take the roller coaster DataFrame and an amusement park name as arguments. Make sure to include informative labels that describe your visualization.

   Call your function with the roller coaster DataFrame and amusement park name.

In [7]:
def bar_park(df, park):
  park_df = df[df['park'] == park]
  roller_coaster = park_df['name']
  inversions = park_df['num_inversions']
  plt.figure(figsize = (20, 15))
  ax = plt.subplot()
  ay = plt.subplot()
  plt.bar(range(len(roller_coaster)), inversions)
  ax.set_xticks(range(len(roller_coaster)))
  ax.set_xticklabels(roller_coaster)
  plt.xticks(rotation=45)
  plt.legend([park])
  plt.show()

print(bar_park(coasters, 'Walibi Belgium'))

8. Write a function that creates a pie chart that compares the number of operating roller coasters (`'status.operating'`) to the number of closed roller coasters (`'status.closed.definitely'`). Your function should take the roller coaster DataFrame as an argument. Make sure to include informative labels that describe your visualization.

   Call your function with the roller coaster DataFrame.

In [8]:
def pie(coasters):
  df_operating = coasters[coasters['status'] == 'status.operating']
  df_closed = coasters[coasters['status'] == 'status.closed.definitely']
  count = [len(df_operating), len(df_closed)]
  labelsdata = ['Operating', 'Closed']
  plt.pie(count, autopct='%0.1f%%', labels = labelsdata)
  plt.axis('equal')
  plt.show()

print(pie(coasters))

9. `.scatter()` is another useful function in matplotlib that you might not have seen before. `.scatter()` produces a scatter plot, which is similar to `.plot()` in that it plots points on a figure. `.scatter()`, however, does not connect the points with a line. This allows you to analyze the relationship between two variables. Find [`.scatter()`'s documentation here](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html).

   Write a function that creates a scatter plot of two numeric columns of the roller coaster DataFrame. Your function should take the roller coaster DataFrame and two-column names as arguments. Make sure to include informative labels that describe your visualization.
   
   Call your function with the roller coaster DataFrame and two-column names.

In [9]:
def scatter(df, column1, column2):
  c1 = df[column1]
  c2 = df[column2]
  x = range(len(df))
  plt.figure(figsize=(20, 20))
  ax = plt.subplot()
  plt.scatter(x, c1, color= 'blue', alpha= 0.5)
  plt.scatter(x, c2, color='green', alpha=0.5)
  ax.set_xlabel('Variables')
  ax.set_ylabel('Roller Coasters')
  plt.ylim(0, 200)
  plt.legend([column1, column2])
  plt.show()

print(scatter(coasters, 'speed', 'height'))

10. Part of the fun of data analysis and visualization is digging into the data you have and answering questions that come to your mind.

    Some questions you might want to answer with the datasets provided include:
    - What roller coaster seating type is most popular? And do different seating types result in higher/faster/longer roller coasters?
    - Do roller coaster manufactures have any specialties (do they focus on speed, height, seating type, or inversions)?
    - Do amusement parks have any specialties?
    
    What visualizations can you create that answer these questions, and any others that come to you? Share the questions you ask and the accompanying visualizations you create on the Codecademy forums.

## Solution

Great work! Visit [our forums](https://discuss.codecademy.com/t/roller-coaster-challenge-project-python-pandas/462378) or the file **Roller Coaster_Solution.ipynb** to compare your project to our sample solution code. You can also learn how to host your own solution on GitHub so you can share it with other learners! Your solution might look different from ours, and that's okay! There are multiple ways to solve these projects, and you'll learn more by seeing others' code.