## Exploratory Data Analysis
In this first level of data exploration, I will analyze and describe key aspects of the dataset, focusing on uncovering insights from my perspective. This level is designed for beginners, but I aim to provide a structured and thoughtful approach to answering the given questions. Specifically, I will explore the following:

+ What is the first and last date readings were taken on?
+ What is the average torque?
+ Which assembly line has the highest readings of machine downtime?

Through this process, I will apply fundamental data analysis techniques to extract meaningful insights and set the foundation for deeper exploration in subsequent levels.

In [1]:
# import the necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


In [6]:
# read the cleaned data as a dataframe
machine_ori = pd.read_csv('../data/machine_downtime_cleaned.csv')
machine_ori.columns

Index(['Unnamed: 0', 'Date', 'Machine_ID', 'Assembly_Line_No',
       'Coolant_Temperature', 'Hydraulic_Oil_Temperature',
       'Spindle_Bearing_Temperature', 'Spindle_Vibration', 'Tool_Vibration',
       'Voltage(volts)', 'Torque(Nm)', 'Downtime', 'Hydraulic_Pressure(Pa)',
       'Coolant_Pressure(Pa)', 'Air_System_Pressure(Pa)', 'Cutting(N)',
       'Spindle_Speed(RPS)'],
      dtype='object')

### 1.1 What is the first and last date readings were taken on?

The data from the machine downtime for the company was taken between 11th of November 2021 to 3rd of July 2024. So roughly, we can say that we have about 9 months of machine downtime data


In [5]:
#get the first and last date reading
machine_ori['Date'].agg(['min', 'max'])

min    2021-11-24
max    2022-07-03
Name: Date, dtype: object

### 1.2 What is the average torque?
In this level of data exploration, one of the key tasks is to determine the average torque. However, instead of computing a single overall average, which could introduce bias due to the presence of three different machines on each floor, a more accurate approach is to calculate the average torque for each machine separately. This ensures that we account for variations between different machines and provide a more precise representation of the data.

Torque is a crucial measurement in industrial settings, representing the rotational force applied to a machine. It directly impacts the efficiency, performance, and maintenance needs of machinery. Understanding the average torque for each machine allows us to identify potential operational inconsistencies and optimize machine performance.

To enhance interpretability, we will visualize the average torque per machine using a well-structured bar chart. This visualization will include appropriate color schemes, annotations, and a clear layout to make the insights more accessible and actionable.

### 1.2.1 How close are the troque values to the average? 

While the average torque gives us an overall idea of the force exerted by each machine, it doesn't tell us how consistent the readings are over time. This is where standard deviation comes in.

+ A low standard deviation means the torque values are closely packed around the mean, indicating consistent machine performance.
+ A high standard deviation means the torque values fluctuate more, suggesting variability in machine operation, which could be due to load changes, tool wear, or operational inconsistencies.

By analyzing standard deviation, we can determine whether a machine maintains stable performance or if there are significant variations that might require further investigation.

Torque is a measure of the rotational force applied to a machine, expressed in Newton-meters (Nm). It plays a crucial role in determining machine efficiency, stability, and overall performance. In our analysis, we computed the average torque along with the standard deviation for each machine unit:

+ Makino-L1-Unit1-2013: 24.98 Nm ± 6.07 Nm
+ Makino-L2-Unit1-2015: 25.21 Nm ± 6.22 Nm
+ Makino-L3-Unit1-2015: 25.56 Nm ± 6.07 Nm

The average torque tells us the typical force exerted by each machine, while the standard deviation (SD) provides insight into how much the torque values fluctuate over time.

+ A higher standard deviation (like 6.22 Nm for Makino-L2) indicates greater variation in torque readings, which could be due to changing loads, machine wear, or inconsistent operation.
+ A lower standard deviation suggests more stable and predictable performance.

By analyzing both metrics together, we can assess machine consistency, detect potential inefficiencies, and ensure optimal operation.

**Makino-L2-Unit1-2015** has the highest standard deviation (6.22 Nm), meaning its torque values fluctuate the most. **Makino-L1 and Makino-L3** have a standard deviation of around 6.07 Nm, slightly lower than Makino-L2.

Possible Causes:
+ Inconsistent workload distribution
+ Tool wear or improper calibration
+ Irregular material properties affecting 



In [9]:
# get the average torque
# Grouping by machine and calculating average torque
avg_torque_per_machine = machine_ori.groupby('Machine_ID')['Torque(Nm)'].agg(['mean', 'std']).reset_index()
avg_torque_per_machine

Unnamed: 0,Machine_ID,mean,std
0,Makino-L1-Unit1-2013,24.975462,6.072066
1,Makino-L2-Unit1-2015,25.209428,6.218479
2,Makino-L3-Unit1-2015,25.55585,6.067492


### 1.3 Assembly Line No with highest reading of machine failures

In addition to analyzing torque, we also examined the number of machine failures across different assembly lines. Machine failures can be caused by various factors, such as mechanical wear, excessive loads, improper calibration, or environmental conditions. Below are the recorded failures for each assembly line:

+ Shopfloor-L1: 454 failures (Highest)
+ Shopfloor-L3: 415 failures
+ Shopfloor-L2: 396 failures (Lowest)

Key Observations:
+ Shopfloor-L1 has the highest number of failures (454), indicating that machines in this section may be experiencing higher stress, improper maintenance, or operational inefficiencies.
+ Shopfloor-L3 is slightly better but still has a significant failure count (415).
+ Shopfloor-L2 has the lowest failure count (396), suggesting it may have better maintenance or less intensive workloads.

In [18]:
# get Number of machine failure that occur on each assembly line
machine_failure_reading = machine_ori[machine_ori['Downtime'] =='Machine_Failure'].\
                          groupby('Assembly_Line_No')['Downtime'].value_counts().reset_index()
                          
machine_failure_reading.sort_values(by = 'count', ascending=False)

Unnamed: 0,Assembly_Line_No,Downtime,count
0,Shopfloor-L1,Machine_Failure,454
2,Shopfloor-L3,Machine_Failure,415
1,Shopfloor-L2,Machine_Failure,396
