Problem statement
For this project you must create a data set by simulating a real-world phenomenon of your choosing. You may pick any phenomenon you wish – you might pick one that is of interest to you in your personal or professional life. Then, rather than collect data related to the phenomenon, you should model and synthesise such data using Python. We suggest you use the numpy.random package for this purpose.

Specifically, in this project you should:

• Choose a real-world phenomenon that can be measured and for which you could
    collect at least one-hundred data points across at least four different variables.
    
• Investigate the types of variables involved, their likely distributions, and their
    relationships with each other.
    
• Synthesise/simulate a data set as closely matching their properties as possible.

• Detail your research and implement the simulation in a Jupyter notebook – the
    data set itself can simply be displayed in an output cell within the notebook.
    Note that this project is about simulation – you must synthesise a data set. Some
    students may already have some real-world data sets in their own files. It is okay to
    base your synthesised data set on these should you wish (please reference it if you do),
    but the main task in this project is to create a synthesised data set. The next section
    gives an example project idea.

Create a dataset with four variables
Three variables will be selected from data arrays, fourth variable will be a calculation based on the three chosen variables

 Likewise, I investigate
the other four variables, and I also look at the relationships between the variables. I
devise an algorithm (or method) to generate such a data set, simulating values of the
four variables for two-hundred students. I detail all this work in my notebook, and then
I add some code in to generate a data set with those properties.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [7]:
rng = np.random.default_rng()
Temperature = rng.integers(-4,41,100)
print(Temperature)

[34 15 11 39 -4  8 25 37 35 34 29  5 29 -3  5 22 35 31 26 11 27 37 -1 36
 -3 11 21 13 37 22 13 28  4 15 -2 23 25 26  2  1 10 32  0 33  9 26  2  4
  1  2 10 14 -4  8  3 -4 28 40  1 26 15  9 40 14  2  8 22 24  6 -4 -1 24
 34  8  6 29 14 27 12 23 35 11 17 39 20 20 25 40  9  5  1 28 -4 10 10 -2
  3 -4 26 25]


In [17]:
State = np.random.choice(['Arizona', 'Kentucky', 'Florida', 'California'], 100)
print(State)

['Arizona' 'Florida' 'Arizona' 'Kentucky' 'Arizona' 'Arizona' 'California'
 'Kentucky' 'California' 'Arizona' 'Florida' 'Kentucky' 'Kentucky'
 'Kentucky' 'Florida' 'Florida' 'California' 'California' 'Florida'
 'Florida' 'Arizona' 'California' 'Florida' 'Kentucky' 'Arizona'
 'Kentucky' 'Florida' 'Arizona' 'Florida' 'Arizona' 'Arizona' 'Florida'
 'Kentucky' 'Arizona' 'Florida' 'California' 'Florida' 'Florida' 'Arizona'
 'Florida' 'Arizona' 'Florida' 'California' 'California' 'Arizona'
 'Kentucky' 'Arizona' 'Florida' 'Kentucky' 'Florida' 'Kentucky'
 'California' 'California' 'California' 'Florida' 'Arizona' 'Florida'
 'Kentucky' 'Kentucky' 'Kentucky' 'Florida' 'Arizona' 'Arizona' 'Florida'
 'Florida' 'Kentucky' 'California' 'Arizona' 'California' 'Florida'
 'Kentucky' 'Arizona' 'Arizona' 'California' 'California' 'California'
 'Florida' 'Kentucky' 'Arizona' 'California' 'California' 'Kentucky'
 'Kentucky' 'Kentucky' 'Florida' 'Florida' 'Florida' 'Florida' 'Florida'
 'Arizona' 'Kentucky' 

In [18]:
rng = np.random.default_rng()
Age = rng.integers(8,85,100)
print(Age)

[19 46 12 38 17 42 15 58 84 61 54 49 36 38 16 81 45 23 13 51 48 79 73  8
 56 45 79 60 46 23 37 41 16 66 58 79 13 75 60 66 36  8 25 71 14 19 33 13
 60 74 62 22 13 76 79 70 11 41 17  9 67 53 38 76 37 58 76 49 55 23 26 43
 76 40 32 11 61 71 54 78  8 54 61 46 36 34 39 22  9 56 41 30 34 75 16 67
 70 25 72 14]


In [16]:
Swimming_ability = np.random.choice(['Beginner', 'Mediocre', 'Advanced'], 100, p =[0.54,0.35,0.11])
print(Swimming_ability)

# https://www.redcross.org/about-us/news-and-events/press-release/red-cross-launches-campaign-to-cut-drowning-in-half-in-50-cities.html#:~:text=Overall%2C%20the%20survey%20finds%20that,to%2051%20percent%20of%20whites.

['Beginner' 'Beginner' 'Mediocre' 'Beginner' 'Beginner' 'Beginner'
 'Mediocre' 'Beginner' 'Beginner' 'Mediocre' 'Mediocre' 'Mediocre'
 'Mediocre' 'Beginner' 'Advanced' 'Mediocre' 'Beginner' 'Beginner'
 'Beginner' 'Beginner' 'Beginner' 'Beginner' 'Mediocre' 'Beginner'
 'Beginner' 'Mediocre' 'Beginner' 'Mediocre' 'Advanced' 'Beginner'
 'Beginner' 'Mediocre' 'Beginner' 'Beginner' 'Advanced' 'Mediocre'
 'Beginner' 'Advanced' 'Beginner' 'Beginner' 'Beginner' 'Mediocre'
 'Beginner' 'Beginner' 'Advanced' 'Mediocre' 'Beginner' 'Beginner'
 'Beginner' 'Beginner' 'Beginner' 'Mediocre' 'Beginner' 'Advanced'
 'Mediocre' 'Mediocre' 'Advanced' 'Beginner' 'Beginner' 'Beginner'
 'Beginner' 'Beginner' 'Mediocre' 'Mediocre' 'Beginner' 'Beginner'
 'Beginner' 'Beginner' 'Advanced' 'Beginner' 'Beginner' 'Beginner'
 'Beginner' 'Advanced' 'Beginner' 'Mediocre' 'Beginner' 'Mediocre'
 'Beginner' 'Beginner' 'Beginner' 'Beginner' 'Mediocre' 'Mediocre'
 'Beginner' 'Beginner' 'Mediocre' 'Beginner' 'Advanced' 'Begin