# Visualisation of Raw Data

Here we will visualise the raw data.

### Import libraries:

(Note: this may take a while.)

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

### Load the file

Load the external `.csv` file, and load into local variable.

In [2]:
data_frame = pd.read_csv('raw_data.csv', delimiter = ',')

### Info about data

Includes:
 - Column
 - Non-null Count
 - Data-type

In [3]:
data_frame.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13647 entries, 0 to 13646
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   time               13647 non-null  object 
 1   wmo                13647 non-null  int64  
 2   name               13647 non-null  object 
 3   history_product    13647 non-null  object 
 4   air_temp           13647 non-null  float64
 5   apparent_t         13638 non-null  float64
 6   dewpt              13647 non-null  float64
 7   rel_hum            13647 non-null  int64  
 8   delta_t            13647 non-null  float64
 9   wind_dir_deg       13638 non-null  float64
 10  wind_spd_kmh       13638 non-null  float64
 11  gust_kmh           13631 non-null  float64
 12  rain_trace         13385 non-null  float64
 13  rain_ten           13617 non-null  float64
 14  rain_hour          13588 non-null  float64
 15  duration_from_9am  13385 non-null  float64
 16  press              136

### Statistics

Includes:
 - Count
 - Mean
 - Standard deviation
 - Minimum
 - 25th percentile
 - Median
 - 75th percentile
 - Maximum

In [4]:
data_frame = data_frame.select_dtypes(include=['float64', 'int64'])
data_frame = data_frame.drop(columns=['wmo', 'lat', 'lon'])
data_frame.describe()

Unnamed: 0,air_temp,apparent_t,dewpt,rel_hum,delta_t,wind_dir_deg,wind_spd_kmh,gust_kmh,rain_trace,rain_ten,rain_hour,duration_from_9am,press
count,13647.0,13638.0,13647.0,13647.0,13647.0,13638.0,13638.0,13631.0,13385.0,13617.0,13588.0,13385.0,13644.0
mean,18.813417,15.89857,12.270865,68.50011,3.501539,183.087036,20.601335,27.172768,1.636466,0.022266,0.129747,727.685245,1017.346299
std,5.148094,6.407123,5.766975,17.673533,2.414986,103.764368,10.447793,13.955201,6.992598,0.205415,0.902639,421.323633,7.162183
min,5.6,-1.1,-12.3,7.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,5.0,992.9
25%,14.8,11.0,8.0,58.0,1.8,100.0,13.0,17.0,0.0,0.0,0.0,360.0,1012.0
50%,19.1,16.1,12.7,71.0,3.0,183.0,19.0,26.0,0.0,0.0,0.0,720.0,1017.2
75%,22.5,20.9,17.1,82.0,4.6,275.0,28.0,35.0,0.2,0.0,0.0,1110.0,1022.2
max,41.5,39.5,24.1,98.0,17.3,360.0,70.0,100.0,81.4,6.6,28.4,1440.0,1039.6


In [5]:
# Calculate skewness
skewness = data_frame.skew()

print(f"Skewness:\n{skewness}")

Skewness:
air_temp              0.030047
apparent_t           -0.043679
dewpt                -0.412804
rel_hum              -0.672205
delta_t               1.387489
wind_dir_deg         -0.158023
wind_spd_kmh          0.837242
gust_kmh              0.954444
rain_trace            7.237378
rain_ten             17.924453
rain_hour            13.368051
duration_from_9am     0.035108
press                 0.088362
dtype: float64
