## Concatenate DataFrames
Let's assume you have gathered multiple datasets about your environment for the same time period, and would like to analyze them.

The first step is to combine the datasets into one, to comfortably work with them.

You'll work on combining the three DataFrames temperature, humidity, and windspeed.

In [None]:
# Rename the columns
temperature.columns = ['temperature']
humidity.columns = ['humidity']
windspeed.columns = ['windspeed']
# Create list of DataFrames
df_list = [temperature,humidity,windspeed]
#print(temperature.head())

# Concatenate files
environment = pd.concat(df_list, axis=1)

# Print first rows of the DataFrame
print(environment.head())

## Combine and resample
You'll now combine environmental data with a traffic dataset. The traffic consists of 2 columns, light_veh and heavy_veh.

heavy_veh represents the number of heavy vehicles like lorries or busses per hour on a road of a small city.
light_veh contains the number of light vehicles, like automobiles or motorbikes per hour on that road.
The environmental dataset consists of

temperature in degree Celsius.
humidity in percent.
sunshine duration in seconds.
Since the traffic-dataset is in 1 hour buckets, but the environmental data is in 10 minute buckets, you'll need a way to resolve this.

The data is available as environ and traffic.

In [None]:
# Combine the DataFrames
environ_traffic = pd.concat([environ, traffic], axis=1)

# Print first 5 rows
print(environ_traffic.head())

In [None]:
# Combine the DataFrames
environ_traffic = pd.concat([environ, traffic], axis=1)

# Print first 5 rows
print(environ_traffic.head())

# Create agg logic
agg_dict = {"temperature": "max", "humidity": "max", "sunshine": "sum", 
            "light_veh": "sum", "heavy_veh": "sum",
            }

# Resample the DataFrame 
environ_traffic_resampled = environ_traffic.resample('1h').agg(agg_dict)
print(environ_traffic_resampled.head())

## Heatmaps
You're going to keep working with the environmental dataset from the previous lessons. However, it now contains 3 additional columns about a river's water-status in the same geographic area.

The additional columns are:

flow-rate in m3/s
water-level in cm
water-temperature in °C
You'll now try to find correlations between the columns.

pandas as pd, seaborn as sns and matplotlib.pyplot as plt have been imported for you.

In [None]:
# Calculate correlation
corr = data.corr()

# Print correlation
print(corr)

# Create a heatmap
sns.heatmap(corr, annot=True)

# Show plot
plt.show()

## Pairplot
You'll now further investigate the data using a pairplot.

A pairplot can be a useful tool since it combines histograms with scatter-plots showing the distribution between 2 columns.

You'll work with the water-status data we've seen before.

pandas as pd and matplotlib.pyplot as plt have been imported for you, and the data is available as data.

In [None]:
# Import required modules
import seaborn as sns

# Create a pairplot
sns.pairplot(data)

# Show plot
plt.show()

## Standard deviation
You should now be familiar with the environmental dataset. However, until now you used a cleaned version of the dataset. The original dataset contained multiple outliers, which would have tainted the analysis.

You'll now work on visualizing these outliers by using the method you've just learned.

In [None]:
# Calculate mean
data["mean"] = data['temperature'].mean()
data['std_temp'] = data['temperature'].std()

# Calculate upper and lower limits
data["upper_limit"] = data['mean'] + (3 * data['std_temp'])
data["lower_limit"] = data['mean'] - (3 * data['std_temp'])

# Plot the DataFrame
data.plot()

plt.show()

## Autocorrelation
For this exercise, you'll be using the traffic dataset. I've combined the columns for heavy and light vehicles into one "vehicle" column, since we've seen before that the two behave as one.

You'll first plot the data, and then visualize the autocorrelation, before answering some questions about the created plots.

The data is available as traffic

In [None]:
# Plot traffic dataset
traffic[:"2018-11-10"].plot()

# Show plot
plt.show()

In [None]:
# Plot traffic dataset
traffic[:"2018-11-10"].plot()

# Show plot
plt.show()

# Import tsaplots
from statsmodels.graphics import tsaplots

# Plot autocorrelation
tsaplots.plot_acf(traffic['vehicles'], lags=50)


# Show the plot
plt.show()

## Seasonal decomposition
In the last exercise, you identified some repetitive patterns in the traffic data with both visual inspection and using an autocorrelation plot.

You'll now dissect this data further by splitting it into it's components.

The data has been loaded for you into traffic.

In [None]:
# Import modules
import statsmodels.api as sm

# Perform decompositon 
res = sm.tsa.seasonal_decompose(traffic['vehicles'])

# Print the seasonal component
print(res.seasonal.head())

# Plot the result
res.plot()

# Show the plot
plt.show()

In [None]:
# Resample DataFrame to 1h
df_seas = df.resample('1h').max()

# Run seasonal decompose
decomp = sm.tsa.seasonal_decompose(df_seas)
# Resample DataFrame to 1h
df_seas = df.resample('1h').max()

# Run seasonal decompose
decomp = sm.tsa.seasonal_decompose(df_seas)

# Plot the timeseries
plt.title("Temperature")
plt.plot(df_seas['temperature'], label="temperature")

# Plot trend and seasonality
plt.plot(decomp.trend, label="trend")
plt.plot(decomp.seasonal, label="seasonal")
plt.legend()
plt.show()
