#Transmission Dynamics

Data from sensors recording the temperature of two train gearboxes.
The data has been provided as a csv, with columns as follows:

- datetime: The date and time that the recording took place at. These are recorded in the UTC timezone.
- temperature: The temperature recorded by the sensor in degrees Celsius
- gearbox: Which gearbox the recording came from – either “A” or “B”.

In [5]:
import pandas as pd 
import plotly.express as px

## Dataset overview

In [6]:
!pwd

/content


In [7]:
df = pd.read_csv('test_temperature_data.csv')

In [8]:
df.head()

Unnamed: 0,datetime,temperature,gearbox
0,2023-01-01 05:12:13,2.852171,A
1,2023-01-01 05:14:26,3.626351,B
2,2023-01-01 05:17:13,2.666521,A
3,2023-01-01 05:19:26,3.41751,B
4,2023-01-01 05:22:13,2.148986,A


In [9]:
df.tail()

Unnamed: 0,datetime,temperature,gearbox
1147,2023-01-03 04:59:26,5.05608,B
1148,2023-01-03 05:02:13,5.796399,A
1149,2023-01-03 05:04:26,4.055147,B
1150,2023-01-03 05:07:13,5.193202,A
1151,2023-01-03 05:09:26,4.592481,B


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1152 entries, 0 to 1151
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   datetime     1152 non-null   object 
 1   temperature  1152 non-null   float64
 2   gearbox      1152 non-null   object 
dtypes: float64(1), object(2)
memory usage: 27.1+ KB


The data contains 1152 obersavtions, we can see that 2 variables are represented by the object data type.

In [11]:
df.describe()

Unnamed: 0,temperature
count,1152.0
mean,22.17244
std,14.858138
min,1.893541
25%,6.397799
50%,22.286165
75%,35.196479
max,52.536204


In [12]:
df['datetime'] = pd.to_datetime(df['datetime'])

In [13]:
df['gearbox'] = df['gearbox'].astype('category')

## Exploratory Data Analysis

In [14]:
df.isnull().sum()

datetime       0
temperature    0
gearbox        0
dtype: int64

In [15]:
duplicate_rows_df = df[df.duplicated()]
print(f'There are {0} duplicated rows in the dataset'.format(duplicate_rows_df))

There are 0 duplicated rows in the dataset


The data set does not contain either missing nor duplicated data.

## Outliners Visualisation



In [16]:
# Boxplots to get an idea of the distribution/outliers
fig = px.box(df, x="gearbox", y="temperature",color = 'gearbox',
            title="Gearbox bloxpots",
            width=1250, height=600,
            labels={"gearbox": "Type of gearbox", "temperature": "Temperature °C"},
            template="simple_white"
            )
fig.update_yaxes(showgrid=True)
fig.show()

## Visualisation for Client

In [17]:
fig = px.line(df, x="datetime", y="temperature", color="gearbox",
            width=1250, height=600,
            labels={
                "gearbox": "Type of gearbox",  "datetime": "Date", "temperature": "Temperature °C"
            },
            category_orders={
                "gearbox": ["A", "B"]
            },
            template="simple_white"
            )

fig.update_layout(yaxis_range=[0,55], hovermode='x unified')

fig.update_yaxes(showgrid=True,
                 tickcolor='black', 
                 tickfont=dict(family='Arial', color='black', size=14),
                 title_font=dict(size=16, family='Arial'))

fig.update_xaxes(tickcolor='black',
                 tickfont=dict(family='Arial', color='black', size=14),
                 title_font=dict(size=16, family='Arial'))

fig.update_layout(legend_title_font_color="black",
                  title_font=dict(size=14, family='Arial'))

fig.update_layout(title_text='Gearbox sensor readings',title_x = 0.5, title_y=0.9,title_font=dict(size=22, family='Arial'))
fig.show()

In [18]:
fig.write_html("Gearbox.html")

## Observations

It can be seen in the plot a clear pattern of readings where the temperature in the period between 5 am to 8 am reaches its lowest point. In addition both gearboxes depicted a similar behaviour during all the readings with a slightly temperature difference. 
A gradual increment is produced in the gearbox temperature reaching its peak point at roughly 50 C. 
The characteristic parameters of working conditions impacting gearbox operation could be: 

1.   Running state: traction, braking, idling, and static. 
2.   Line features: bridges, tunnels, turns, and ramps

why gearbox temperature may vary:

Load: The amount of load or weight that a train is carrying can have a significant impact on the temperature of the gearbox.

Speed: The speed at which the train is traveling can also impact the temperature of the gearbox.

Ambient temperature: The temperature of the environment in which the train is operating can also have an impact on the temperature of the gearbox. When the ambient temperature is high, the gearbox may struggle to dissipate heat.

Lubrication: The quality and amount of lubrication in the gearbox can also impact its temperature. If the gearbox is not properly lubricated, it may generate more heat( due friction) as it works harder to maintain the train's speed.

Maintenance: the overall condition of the gearbox and the train's other components can impact its temperature.

## Notes
- Will be interesting to look into the gearbox thresholds
- Why does the temperature remain at zero for that prolong period, is it normal?,  what is the operational state of the gearbox at that time?