# Programming for Data Analytics - Project - Gerard Ball

# Brief - 
>Problem statement
For this project you must create a data set by simulating a real-world phenomenon of your choosing. You may pick any phenomenon you wish – you might pick one that is of interest to you in your personal or professional life. such data using Python. We suggest you use the numpy.random package for this purpose. Then, rather than collect data related to the phenomenon, you should model and synthesise 
Specifically, in this project you should:
• Choose a real-world phenomenon that can be measured and for which you could collect at least one-hundred data points across at least four different variables.
• Investigate the types of variables involved, their likely distributions, and their relationships with each other.
• Synthesise/simulate a data set as closely matching their properties as possible.
• Detail your research and implement the simulation in a Jupyter notebook – the data set itself can simply be displayed in an output cell within the notebook.
Note that this project is about simulation – you must synthesise a data set. Some students may already have some real-world data sets in their own files. It is okay to base your synthesised data set on these should you wish (please reference it if you do), but the main task in this project is to create a synthesised data set. The next section gives an example project idea.




## Road Map
1. Introduction 
2. Aim
3. Images of subjects
4. Data Collection
5. Data Synthesis
6. Exploratory Data Analysis
7. Data Visualisation
8. Statistical Analysis
9. Interpretations of results and Discussions
10. Conclusion


## Introduction

Roller coasters offer a budding adrenaline junkie a release from the trials and tribulations of everyday life. Like many things in life, coasters come in all manner of sizes and types and understanding and discerning the relatsionsip between much of them, can offer up some valuable data analysis. The mission of this project is to simulate and synthesize a diverse these roller coasters, capturing variables like speed, height, type and thrill rating. By carrying out this synthesis, the prtoject aims to create a comprehensive and varied representation of roller coasters worldwide and their many types. The dataset will serve as a valuable resource for analysis, providing insights into the relationships between different coaster characteristics. By leveraging this simulated data, I strive to enhance understanding and appreciation of the factors contributing to the thrill and excitement offered by these wonderful marvels of modern engineering whilst facilitating potential insights for enthusiasts, theme park planners and the amusement industry itself. 

## Data Collection

In [9]:
import pandas as pd

file_path = 'Coasters.csv'

# Read the CSV file into DataFrame
coasters_df = pd.read_csv(file_path)

# Display the DataFrame
print(coasters_df.to_string())

                    Roller Coaster    Type  Speed (km/h)  Height (m)  Thrill Rating
0                         Hyperion   Steel           142          77            9.2
1                            Zadra  Hybrid           121          63            8.7
2                      Dragon Khan   Steel           110          45            8.5
3                     Furious Baco   Steel           134          35            8.8
4                        Shambhala   Steel           134          76            9.0
5                        Red Force   Steel           180         112            9.5
6                         Maverick   Steel           115          32            8.3
7                 Millennium Force   Steel           150          94            9.7
8                          El Toro  Wooden           113          55            8.6
9                 Twisted Colossus  Hybrid           104          39            8.4
10                           Taron   Steel           117          30        

In [26]:
import numpy as np
import pandas as pd

np.random.seed(42)

num_data_points = 100

# Type
roller_coaster_type = 'Steel'

# Mean and standard deviation for 'Steel' type
type_means = {'Steel': 120}
type_stds = {'Steel': 20}

# Generate data for 'Steel' type
speed = np.random.normal(type_means[roller_coaster_type], type_stds[roller_coaster_type], size=num_data_points)
height = np.random.normal(50, 30, size=num_data_points)  # Adjust mean and std as needed
thrill_rating = np.random.normal(8.5, 0.8, size=num_data_points)  # Adjust mean and std as needed

# Create DataFrame
roller_coaster_data = pd.DataFrame({
    'Roller Coaster Type': roller_coaster_type,
    'Speed (km/h)': speed,
    'Height (m)': height,
    'Thrill Rating': thrill_rating
})

# Display summary
mean_speed = np.mean(speed)
std_speed = np.std(speed)
mean_height = np.mean(height)
std_height = np.std(height)
mean_thrill = np.mean(thrill_rating)
std_thrill = np.std(thrill_rating)

print(f"Summary for Roller Coaster Type '{roller_coaster_type}':")
print(f"Mean Speed (km/h): {mean_speed}, Standard Deviation: {std_speed}")
print(f"Mean Height (m): {mean_height}, Standard Deviation: {std_height}")
print(f"Mean Thrill Rating: {mean_thrill}, Standard Deviation: {std_thrill}")

# Display the DataFrame
roller_coaster_data.head()



Summary for Roller Coaster Type 'Steel':
Mean Speed (km/h): 117.92306965211812, Standard Deviation: 18.072323532892593
Mean Height (m): 50.66913761149771, Standard Deviation: 28.466659215319385
Mean Thrill Rating: 8.551917002480359, Standard Deviation: 0.8630782769963261


Unnamed: 0,Roller Coaster Type,Speed (km/h),Height (m),Thrill Rating
0,Steel,129.934283,7.538878,8.78623
1,Steel,117.234714,37.38064,8.948628
2,Steel,132.953771,39.718565,9.366441
3,Steel,150.460597,25.931682,9.343042
4,Steel,115.316933,45.161429,7.397865


In [27]:
import numpy as np
import pandas as pd

np.random.seed(42)

num_data_points_wooden = 50  # Adjust the number of data points as needed

# Mean and standard deviation for Wooden type
wooden_means = {'Speed (km/h)': 80, 'Height (m)': 30, 'Thrill Rating': 8.0}
wooden_stds = {'Speed (km/h)': 15, 'Height (m)': 10, 'Thrill Rating': 0.8}

# Generate data for Wooden type separately
wooden_data = pd.DataFrame({
    'Roller Coaster Type': 'Wooden',
    'Speed (km/h)': np.random.normal(wooden_means['Speed (km/h)'], wooden_stds['Speed (km/h)'], size=num_data_points_wooden),
    'Height (m)': np.random.normal(wooden_means['Height (m)'], wooden_stds['Height (m)'], size=num_data_points_wooden),
    'Thrill Rating': np.random.normal(wooden_means['Thrill Rating'], wooden_stds['Thrill Rating'], size=num_data_points_wooden)
})

# Display the summary statistics for the Wooden type
wooden_summary = wooden_data.describe().loc[['mean', 'std']].rename(index={'mean': 'Mean', 'std': 'Standard Deviation'})
print(f"Summary for Roller Coaster Type 'Wooden':\n{wooden_summary}")

Summary for Roller Coaster Type 'Wooden':
                    Speed (km/h)  Height (m)  Thrill Rating
Mean                   76.617891   30.177809       7.968570
Standard Deviation     14.005032    8.743250       0.812331


In [39]:
import pandas as pd
import numpy as np
# will have to edit these to get closer to what i want
means = {
    'Steel': {'Speed': 118, 'Height': 50, 'Thrill Rating': 8.5},
    'Wooden': {'Speed': 94, 'Height': 30, 'Thrill Rating': 8.0},
    'Hybrid': {'Speed': 109, 'Height': 40, 'Thrill Rating': 8.3}
}

ranges = {
    'Steel': {'Speed': 20, 'Height': 30, 'Thrill Rating': 1.0},
    'Wooden': {'Speed': 15, 'Height': 20, 'Thrill Rating': 1.0},
    'Hybrid': {'Speed': 18, 'Height': 25, 'Thrill Rating': 1.0}
}

# Number of data points
num_data_points = 100

# Generate synthetic data
roller_coaster_types = np.random.choice(['Steel', 'Wooden', 'Hybrid'], size=num_data_points)

roller_coaster_data = pd.DataFrame()

for coaster_type in roller_coaster_types:
    data = {
        'Roller Coaster Type': coaster_type,
        'Speed (km/h)': np.random.normal(means[coaster_type]['Speed'], ranges[coaster_type]['Speed'], 1),
        'Height (m)': np.random.normal(means[coaster_type]['Height'], ranges[coaster_type]['Height'], 1),
        'Thrill Rating': np.random.normal(means[coaster_type]['Thrill Rating'], ranges[coaster_type]['Thrill Rating'], 1)
    }
    roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)

# Display the synthetic data
print(roller_coaster_data)

   Roller Coaster Type          Speed (km/h)             Height (m)  \
0                Steel   [152.3151201601249]    [73.15645985106296]   
1               Wooden  [57.173124614165715]  [-0.8892898721293507]   
2               Wooden   [62.17666087916774]   [35.859870549039115]   
3               Hybrid  [110.65250394245211]   [41.596407533353585]   
4               Wooden   [88.08367229382219]   [21.454287487845356]   
..                 ...                   ...                    ...   
95              Hybrid  [130.70668253709857]    [32.32262788222247]   
96              Hybrid  [114.24609861781252]    [21.73295090490548]   
97              Wooden   [88.30363990029637]   [28.247234843670444]   
98               Steel  [129.76399869943714]    [65.26004748761407]   
99               Steel  [116.05383622455946]    [4.717342597797122]   

           Thrill Rating  
0    [8.019053872741909]  
1    [8.503429082601878]  
2    [6.934436275134974]  
3    [8.420589933890712]  
4   [7.27920

  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coaster_data = roller_coaster_data.append(data, ignore_index=True)
  roller_coa