# Simulating Time Series Data

For now, we've seen where to find **time series** data and how to process it. Now let's look at how to create **time series** data through simulation. 

We will divide it into 3 parts. In the first, we will compare **time series** data simulations with other types of data simulations, seeing which specific new areas of interest come to light when considering the passage of time. In the second part, we will look at some code-based simulations. Finally, in the third part, we will analyze some general trends in **time series** simulations.

Specific examples for generating different types of **time series** data:
- we will simulate the email opening and donation behavior of members of a non-profit organization over several years;
- we will simulate events in a taxi fleet of a thousand vehicles with various shift start times and passenger boarding frequencies;
- we will simulate step by step the evolution of the magnetic state of a solid at a given temperature and size using the relevant laws of physics;

These three examples correlate to three classes of **time series** simulations:
- *heuristic simulations:*
    - we decide how the world should work, ensuring logic and coding, rule by rule;
- *discrete event simulations (SED):*
    - we will create individual actors that follow certain rules in our universe and then implement these actors to see how the universe evolves over time;
- *simulations based on laws of physics:*
    - we will apply the laws of physics to see how a system evolves over time;

### Why is Time Series Simulation Special?

Data simulation is an area of ​​Data Science that is rarely taught despite being an essential skill for **time series** data. This is one of the negative aspects of temporal data: no two data points in the same time series are exactly comparable, as these points occur at different times. If we want to think about *what could have happened in a given time*, we enter the world of simulation.

### Simulation versus Prediction

Simulation and forecasting are similar practices. In both, we must formulate hypotheses about the dynamics and parameters of the underlying system and then extrapolate from these hypotheses in order to generate data points. However, there are important differences to consider when learning and developing simulations rather than predictions:
- it may be easier to integrate qualitative observations into a simulation than into a prediction;
- simulations are run at scale, so that we can analyze several alternative scenarios, while forecasts must be generated with more care;
- the risks of simulations are lower than predictions, as there are no lives or resources at stake. Therefore, you can be more creative and exploratory in your initial rounds of simulations. Obviously, sooner or later, you want to be sure that you can justify how you build your simulations, just as you justify your predictions.

### Installing Libs

In [6]:
import numpy as np
import pandas as pd

#### Doing it ourselves

In this case of simulation, we will do the simulation ourselves, ensuring that we do not specify an illogical order

In [14]:
# user status
years      = ['2014', '2015', '2016', '2017', '2018']

userStatus = ['bronze', 'silver', 'gold', 'inactive']

userYears  = np.random.choice(years, 1000, 
                             p = [0.1, 0.1, 0.15, 0.30, 0.35])

userStats  = np.random.choice(userStatus, 1000, 
                             p = [0.5, 0.3, 0.1, 0.1])

yearJoined = pd.DataFrame({'yearJoined': userYears})

userJoined = pd.DataFrame({'userJoined': userStats})

yearJoined, userJoined

Note that there are already many rules/assumptions integrated into the simulation just in these lines of code. We stipulate probabilities specific to the years in which members joined. We also made the user's status completely independent of the year they joined.