<img src="Images/SpaceX.jpg" style = "width: 2800px; height: 800px; object-fit: cover; float: left; text-align: left;" alt = "image description">

# SpaceX - Satellite Failure Prediction
<hr style="border-top: 2px solid black;">

**Introduction:** SpaceX, a leading innovator in the satellite industry, provides advanced satellite-based services and technology to customers globally. With the increasing demand for satellite-based internet and communications, the company is dedicated to ensuring the reliability and longevity of their satellite fleet while minimizing operational costs. One of their most notable ventures is the Starlink program, which aims to provide high-speed internet access to remote and underserved areas around the world through a network of thousands of satellites in low Earth orbit. Starlink's goal is to offer satellite-based internet access to individuals, businesses, and organizations, who would otherwise be unable to access high-speed internet, thus closing the digital divide and connecting the world.

**Business Problem:** SpaceX, through its Starlink program, is facing a significant challenge in maintaining the integrity of their satellite fleet. The high costs associated with replacing failed satellites pose a significant financial burden to the company, and can negatively impact the availability and quality of service for customers. To address this problem, the company aims to predict which satellites are at risk of failure, in order to implement preventative measures and avoid costly replacements. By using data and machine learning techniques, we can analyze telemetry data, weather data, and other relevant information to identify patterns and trends that indicate a satellite's likelihood of failure. This will enable the company to proactively address potential issues and ensure the continuity of service for customers, while reducing costs and maintaining the overall efficiency of the satellite fleet.

**Project Overview:** This project aims to address the challenge of maintaining the integrity and reliability of SpaceX's satellite fleet through the implementation of advanced data analysis and machine learning techniques. As the lead Data Engineer and Data Scientist, my role is to design and implement a comprehensive data pipeline that ingests, processes, and analyzes various data sources such as telemetry, weather and satellite configuration data. By leveraging these insights, I will create predictive models that identify the likelihood of satellite failure based on various factors such as satellite age, orbital characteristics and environmental conditions. These predictions will enable SpaceX to adopt preventative measures and optimize their satellite fleet, resulting in increased reliability, improved efficiency and cost-effectiveness of their operations. Ultimately, this project aims to enhance the customer experience and contribute to the company's profitability.

**Data Collection:** The satellite and telemetry data used in this project was obtained from Kaggle (https://www.kaggle.com/), a platform that hosts a wide range of datasets. The weather data was sourced from the National Oceanic and Atmospheric Administration (NOAA) (https://www.noaa.gov/). The data was collected over a period of several years to ensure a large and diverse dataset for training the predictive models. It is worth noting that this data will be used solely for the purpose of personal projects aimed at increasing my machine learning technical skills and for no other purpose.

## Data Inspection

In [None]:
!ls Data

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

import plotly.express as px
import bokeh
import networkx as nx
import folium

from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, ColorBar
from bokeh.palettes import Spectral6
from bokeh.transform import linear_cmap

In [None]:
df1 = pd.read_csv('Data/SpaceX Satellite Dataset.csv', index_col = 'Satellite ID(Fake)', 
                  usecols = lambda column: column not in ["Unnamed: 0"])
df2 = pd.read_csv('Data/weather.csv')
df3 = pd.read_csv('Data/iot_telemetry_data.csv')

In [None]:
# Reading the SpaceX Satellite dataset
df1.head()

In [None]:
df1.info()

In [None]:
df1.shape

In [None]:
df1.isna().sum()

In [None]:
# It seems like we have 3 types of data. (Int64, Object, and Float64)
df1.dtypes

In [None]:
df1.duplicated().sum()

In [None]:
# Almost all of the data in our df1 are unique and note repeated... Except Expected Lifetime.
df1.apply(pd.value_counts).sum()

In [None]:
df1.describe()

In [None]:
plt.figure(figsize=(20,10))
my_cmap = sns.color_palette("mako")
sns.countplot(x = "Age", data = df1, color = 'blue',  palette = my_cmap)
plt.xlabel("Age (days)", fontsize = 25)
plt.ylabel("Count", fontsize = 25)
plt.title("Satellites Age Distribution", fontsize = 30)
plt.xticks(fontsize = 16)
plt.yticks(fontsize = 16)
plt.show()

**NOTE:** The age of the satellites in the dataset is presented in days, with the oldest satellite having an age of 1336 days and the youngest having an age of 786 days. This indicates that the oldest satellite was launched 1336 days ago and the youngest was launched 786 days ago, relative to the current date.

In [None]:
# Inspecting weather dataset.
df2.head()

In [None]:
df2.shape

In [None]:
# 3 types of data as well 
df2.info()

In [None]:
# We have a few columsn that has less than 20 missing data and a few has more 300 missing values
df2.isna().sum()

In [None]:
df2.duplicated().sum()

In [None]:
df2.describe()

In [None]:
# Almost all of the data in our df2 are unique and note repeated... Except a few that has missing values.
df2.apply(pd.value_counts).sum()

In [None]:
# coloring the circles according to a continuous variable. Will use WSF2 as our variable.
mapper = linear_cmap(field_name = 'WSF2', palette = Spectral6 ,
                     low=min(df2['WSF2']) ,high = max(df2['WSF2']))

In [None]:
# Create a scatterplot that shows the relationship between windspeed and wind direction.

p = figure(width = 900, height = 600)

# Customize axis labels
p.xaxis.axis_label = 'Wind Speed (mph)'
p.yaxis.axis_label = 'Wind Direction (degrees)'
p.xaxis.major_label_text_font_size = '14pt'
p.yaxis.major_label_text_font_size = '14pt'

# Adding the scatter plot
p.scatter(x = 'WSF2', y = 'WDF2', source = df2, fill_alpha = 0.8, line_color = mapper, color = mapper, size = 10)

# Adding the title
p.title.text = 'Wind Speed Vs Wind Direction'
p.title.text_font_size = '20pt'
p.title.align = 'center'

# Adding X and Y axis
p.xaxis.major_label_text_font_size = '10pt'
p.yaxis.major_label_text_font_size = '10pt'

# Adding the legend
color_bar = ColorBar(color_mapper = mapper['transform'], width = 2, location = (0,0), title = "Wind Speed")
p.add_layout(color_bar, 'right')

# Show the chart
show(p)

**NOTE:** The scatter plot illustrates the range of wind direction and wind speed values observed in the data, with wind direction values ranging from approximately 10 to 350 degrees and wind speed values ranging from approximately 3 to 48 mph. The high variability in wind direction and wind speed could potentially have an impact on the performance and integrity of the satellite fleet. The analysis of this data will aid in identifying potential operational issues that may lead to satellite failure and ultimately help in proactively addressing these issues to ensure the continuity of service for customers.

In [None]:
# Inspecting Telemetry Dataset.
df3.head()

In [None]:
# 405184 rows and 9 columns
df3.info()

In [None]:
df3.shape

In [None]:
# No missing data so far
df3.isna().sum()

In [None]:
df3.describe()

In [None]:
# We have 13 duplicates
df3.duplicated().sum()

In [None]:
df3.apply(pd.value_counts).sum()

In [None]:
from bokeh.plotting import figure, show
from bokeh.palettes import Category20

p = figure(x_axis_label = 'Time (s)', y_axis_label = 'Value', width = 1000, height = 600)

colors = Category20[4]

p.line(df3['temp'], df3['temp'], color = colors[0], legend_label = 'Temperature')
p.line(df3['temp'], df3['humidity'], color = colors[1], legend_label = 'Humidity')
p.line(df3['temp'], df3['light'], color = colors[2], legend_label = 'Light')
p.line(df3['temp'], df3['smoke'], color = colors[3], legend_label = 'Smoke')

p.title.text = 'Telemetry Data'
p.title.text_font_size = '20pt'
p.title.align = 'center'

p.xaxis.major_label_text_font_size = '10pt'
p.yaxis.major_label_text_font_size = '10pt'

p.legend.label_text_font_size = '14pt'
p.legend.title = 'Sensors'
p.legend.title_text_font_size = '14pt'

show(p)