# Predicting the intensity of the magnetic field experienced by satellites in Earth orbit

## Objective

To determine if the magnetic field experienced by satellites can be predicted from their altitude from Earth.

## Hypothesis

We can reasonably assume that the magnetic field will be less intense as the altitude increases.  However, because the Earth's geomagnetic field is not perfectly spherical but instead in the shape of a dipole, with anomalies and distortions from the pressure of the interplanetary magnetic field, the relationship between these two attributes might not be easily modeled.

## Dataset

MOST-268_HD209458_2014-268_HD209458_2014

A .tar file containing .fits files compressed as .tar files.

This dataset is available online: https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/en/search/?Plane.position.bounds@Shape1Resolver.value=ALL&Observation.collection=MOST&Observation.instrument.name=Direct%20image&Observation.type=object#sortCol=caom2%3APlane.time.bounds.lower&sortDir=dsc&col_1=_checkbox_selector;;;&col_2=caom2%3AObservation.uri;;;&col_3=caom2%3APlane.productID;;;&col_4=caom2%3AObservation.target.name;;;&col_5=caom2%3APlane.position.bounds.cval1;;;&col_6=caom2%3APlane.position.bounds.cval2;;;&col_7=caom2%3APlane.time.bounds.lower;;;&col_8=caom2%3AObservation.instrument.name;;;&col_9=caom2%3APlane.time.exposure;;;&col_10=caom2%3AObservation.proposal.pi;;;&col_11=caom2%3AObservation.proposal.id;;;&col_12=caom2%3APlane.calibrationLevel;;;&col_13=caom2%3AObservation.observationID;;;

The data was recorded by the MOST satellite: http://www.asc-csa.gc.ca/fra/satellites/most/default.asp

### Config Variables

In [1]:
DATASETS_PATH = "./datasets/most/"
DATA_DOWNLOAD_LIST = "./datasets/most/cadcUrlList_test.txt"

COLUMNS = [
    "[DEGREES] LONGITUDE OF SATELLITE",
    "[DEGREES] LATITUDE OF SATELLITE",
    "[M] ALTITUDE OF SATELLITE",
    "[DEGREES] ANGLE TO EARTH LIMB",
    "[DEGREES] NADIR RIGHT ASCENSION",
    "[DEGREES] NADIR DECLINATION",
    "[DEGREES] NADIR LONGITUDE",
    "[DEGREES] NADIR LATITUDE",
    "[DEGREES] SOLAR RIGHT ASCENSION",
    "[DEGREES] SOLAR DECLINATION",
    "[DEGREES] SOLAR ALTITUDE",
    "[DEGREES] SOLAR AZIMUTH",
    "[DEGREES] SOLAR LONGITUDE",
    "[DEGREES] SOLAR LATITUDE",
    "[DEGREES] LUNAR RIGHT ASCENSION",
    "[DEGREES] LUNAR DECLINATION",
    "[DEGREES] LUNAR ALTITUDE",
    "[DEGREES] LUNAR AZIMUTH",
    "[DEGREES] LUNAR LONGITUDE",
    "[DEGREES] LUNAR LATITUDE",
    "[DEGREES] LUNAR-TARGET ANGULAR SEPERATION",
    "[nT] MAGNETIC FIELD STRENGTH",
]

CATEGORICAL_FEATURES = [
    "[DEGREES] SOLAR ALTITUDE",
    "[DEGREES] SOLAR AZIMUTH",
    "[DEGREES] SOLAR LONGITUDE",
]

TARGET_FEATURE = "[nT] MAGNETIC FIELD STRENGTH"

### Data Collection

In [None]:
from cadcdata import StorageInventoryClient
client = StorageInventoryClient()

# test connection
print(client.cadcinfo("MOST/261_GSC0041702592_2014_5261.tar"))

In [None]:
# download data as *.tar files

# download one file only for tests
# client.cadcget("MOST/261_GSC0041702592_2014_5261.tar", "./datasets/most/")

# download a list of files from a web search
with open(DATA_DOWNLOAD_LIST, "r") as to_download:
    for row in to_download:
        f = row.split("cadc:")[1]
        client.cadcget(f, DATASETS_PATH)

In [2]:
import os
import tarfile

import pandas as pd
from astropy.io import fits


df_list = list()
df = pd.DataFrame(
    columns=[COLUMNS]
)


for root, dirs, files in os.walk(DATASETS_PATH):
    for f in files:
        
        # for every .tar file in the datasets directory
        if os.path.splitext(f)[1] == ".tar":

            with tarfile.open(
                name=os.path.join(root, f),
                mode="r"
            ) as tar_obj:

                # for every file in the tar file
                for member in tar_obj.getnames():
                    if os.path.splitext(member)[1] == ".fits":

                        # extract .tar file in memory
                        extracted = tar_obj.extractfile(member)

                        # open extracted .fits file
                        with fits.open(extracted) as hdul:
                            hdr = hdul[0].header
                            data_dct = {
                                "[DEGREES] LONGITUDE OF SATELLITE": hdr["SAT_LONG"],
                                "[DEGREES] LATITUDE OF SATELLITE": hdr["SAT_LAT"],
                                "[M] ALTITUDE OF SATELLITE": hdr["SAT_ALT"],
                                "[DEGREES] ANGLE TO EARTH LIMB": hdr["ELA_ANG"],
                                "[DEGREES] NADIR RIGHT ASCENSION": hdr["NAD_RA"],
                                "[DEGREES] NADIR DECLINATION": hdr["NAD_DEC"],
                                "[DEGREES] NADIR LONGITUDE": hdr["NAD_PHI"],
                                "[DEGREES] NADIR LATITUDE": hdr["NAD_THET"],
                                "[DEGREES] SOLAR RIGHT ASCENSION": hdr["SOL_RA"],
                                "[DEGREES] SOLAR DECLINATION": hdr["SOL_DEC"],
                                "[DEGREES] SOLAR ALTITUDE": ["SOL_ALTI"],
                                "[DEGREES] SOLAR AZIMUTH": ["SOL_AZIM"],
                                "[DEGREES] SOLAR LONGITUDE": ["SOL_PHI"],
                                "[DEGREES] SOLAR LATITUDE": hdr["SOL_THET"],
                                "[DEGREES] LUNAR RIGHT ASCENSION": hdr["LUN_RA"],
                                "[DEGREES] LUNAR DECLINATION": hdr["LUN_DEC"],
                                "[DEGREES] LUNAR ALTITUDE": hdr["LUN_ALTI"],
                                "[DEGREES] LUNAR AZIMUTH": hdr["LUN_AZIM"],
                                "[DEGREES] LUNAR LONGITUDE": hdr["LUN_PHI"],
                                "[DEGREES] LUNAR LATITUDE": hdr["LUN_THET"],
                                "[DEGREES] LUNAR-TARGET ANGULAR SEPERATION": hdr["LUN_SEP"],
                                "[nT] MAGNETIC FIELD STRENGTH": hdr["MAG_FLD"],
                            }
                            df_partial = pd.DataFrame(data_dct)
                            df_list.append(df_partial)

In [3]:
df_source = pd.concat(df_list, ignore_index=True)

### Exploratory Data Analysis

In [4]:
df_source.head()

Unnamed: 0,[DEGREES] LONGITUDE OF SATELLITE,[DEGREES] LATITUDE OF SATELLITE,[M] ALTITUDE OF SATELLITE,[DEGREES] ANGLE TO EARTH LIMB,[DEGREES] NADIR RIGHT ASCENSION,[DEGREES] NADIR DECLINATION,[DEGREES] NADIR LONGITUDE,[DEGREES] NADIR LATITUDE,[DEGREES] SOLAR RIGHT ASCENSION,[DEGREES] SOLAR DECLINATION,...,[DEGREES] SOLAR LONGITUDE,[DEGREES] SOLAR LATITUDE,[DEGREES] LUNAR RIGHT ASCENSION,[DEGREES] LUNAR DECLINATION,[DEGREES] LUNAR ALTITUDE,[DEGREES] LUNAR AZIMUTH,[DEGREES] LUNAR LONGITUDE,[DEGREES] LUNAR LATITUDE,[DEGREES] LUNAR-TARGET ANGULAR SEPERATION,[nT] MAGNETIC FIELD STRENGTH
0,-96.4876,-37.0312,825230.187,22.346,226.8791,37.0312,106.1988,-65.6456,144.3902,14.1689,...,SOL_PHI,-6.9662,128.8311,24.6626,-8.6114,65.5178,-165.4208,-8.7262,144.322,23037.761
1,-97.045,-38.7731,825372.393,22.428,226.4471,38.7731,108.7028,-67.1102,144.3906,14.1688,...,SOL_PHI,-6.9663,128.8157,24.6802,-9.6373,65.9931,-165.3998,-8.7351,144.302,23608.611
2,-97.624,-40.5132,825515.283,22.515,225.9934,40.5132,111.524,-68.5304,144.3909,14.1687,...,SOL_PHI,-6.9664,128.7994,24.6973,-10.6571,66.5196,-165.3785,-8.743,144.281,24207.669
3,-98.2272,-42.2513,825658.971,22.607,225.5156,42.2513,114.7169,-69.8969,144.3912,14.1686,...,SOL_PHI,-6.9664,128.7824,24.7137,-11.6701,67.0999,-165.3567,-8.7501,144.261,24833.059
4,-98.8574,-43.9872,825803.571,22.704,225.0107,43.9872,118.343,-71.1979,144.3915,14.1685,...,SOL_PHI,-6.9665,128.7645,24.7295,-12.6756,67.7372,-165.3347,-8.7562,144.242,25482.847


In [5]:
df_source.tail()

Unnamed: 0,[DEGREES] LONGITUDE OF SATELLITE,[DEGREES] LATITUDE OF SATELLITE,[M] ALTITUDE OF SATELLITE,[DEGREES] ANGLE TO EARTH LIMB,[DEGREES] NADIR RIGHT ASCENSION,[DEGREES] NADIR DECLINATION,[DEGREES] NADIR LONGITUDE,[DEGREES] NADIR LATITUDE,[DEGREES] SOLAR RIGHT ASCENSION,[DEGREES] SOLAR DECLINATION,...,[DEGREES] SOLAR LONGITUDE,[DEGREES] SOLAR LATITUDE,[DEGREES] LUNAR RIGHT ASCENSION,[DEGREES] LUNAR DECLINATION,[DEGREES] LUNAR ALTITUDE,[DEGREES] LUNAR AZIMUTH,[DEGREES] LUNAR LONGITUDE,[DEGREES] LUNAR LATITUDE,[DEGREES] LUNAR-TARGET ANGULAR SEPERATION,[nT] MAGNETIC FIELD STRENGTH
6185,27.6955,43.2818,827920.933,34.298,47.3359,-43.2818,-67.304,71.4463,146.402,13.492,...,SOL_PHI,-7.4022,153.8776,14.7208,22.0045,-90.2347,168.8504,-12.2722,150.556,32742.878
6186,27.048,45.0153,827578.822,34.151,46.8138,-45.0153,-63.4011,72.7535,146.4024,13.4919,...,SOL_PHI,-7.4023,153.9058,14.6971,22.3596,-91.3462,168.8138,-12.2667,150.561,33191.927
6187,26.3678,46.7463,827236.158,33.997,46.2589,-46.7463,-58.9014,73.9738,146.4027,13.4918,...,SOL_PHI,-7.4023,153.9347,14.674,22.6915,-92.5065,168.777,-12.2621,150.564,33621.949
6188,25.6508,48.4746,826893.523,33.837,45.6673,-48.4746,-53.7158,75.0857,146.403,13.4917,...,SOL_PHI,-7.4024,153.9643,14.6515,22.9998,-93.7191,168.7399,-12.2584,150.567,34033.209
6189,24.892,50.1998,826551.496,33.671,45.0338,-50.1998,-47.7688,76.0634,146.4033,13.4916,...,SOL_PHI,-7.4025,153.9946,14.6297,23.284,-94.9878,168.7027,-12.2556,150.568,34426.219


In [6]:
df_source.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6190 entries, 0 to 6189
Data columns (total 22 columns):
 #   Column                                     Non-Null Count  Dtype  
---  ------                                     --------------  -----  
 0   [DEGREES] LONGITUDE OF SATELLITE           6190 non-null   float64
 1   [DEGREES] LATITUDE OF SATELLITE            6190 non-null   float64
 2   [M] ALTITUDE OF SATELLITE                  6190 non-null   float64
 3   [DEGREES] ANGLE TO EARTH LIMB              6190 non-null   float64
 4   [DEGREES] NADIR RIGHT ASCENSION            6190 non-null   float64
 5   [DEGREES] NADIR DECLINATION                6190 non-null   float64
 6   [DEGREES] NADIR LONGITUDE                  6190 non-null   float64
 7   [DEGREES] NADIR LATITUDE                   6190 non-null   float64
 8   [DEGREES] SOLAR RIGHT ASCENSION            6190 non-null   float64
 9   [DEGREES] SOLAR DECLINATION                6190 non-null   float64
 10  [DEGREES] SOLAR ALTITUDE

In [7]:
df_source.describe()

Unnamed: 0,[DEGREES] LONGITUDE OF SATELLITE,[DEGREES] LATITUDE OF SATELLITE,[M] ALTITUDE OF SATELLITE,[DEGREES] ANGLE TO EARTH LIMB,[DEGREES] NADIR RIGHT ASCENSION,[DEGREES] NADIR DECLINATION,[DEGREES] NADIR LONGITUDE,[DEGREES] NADIR LATITUDE,[DEGREES] SOLAR RIGHT ASCENSION,[DEGREES] SOLAR DECLINATION,[DEGREES] SOLAR LATITUDE,[DEGREES] LUNAR RIGHT ASCENSION,[DEGREES] LUNAR DECLINATION,[DEGREES] LUNAR ALTITUDE,[DEGREES] LUNAR AZIMUTH,[DEGREES] LUNAR LONGITUDE,[DEGREES] LUNAR LATITUDE,[DEGREES] LUNAR-TARGET ANGULAR SEPERATION,[nT] MAGNETIC FIELD STRENGTH
count,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0,6190.0
mean,-0.830746,-0.594771,827007.91309,27.764884,149.796696,0.594771,-13.331339,-0.212073,145.397584,13.831687,-7.185057,141.46102,20.062738,-1.208315,-0.743887,-25.280519,-10.673977,148.739658,31859.840266
std,103.477409,50.17029,5696.156735,4.976773,94.214683,50.17029,95.081307,48.835934,0.581274,0.195586,0.125973,7.747596,2.606041,20.858502,94.541565,171.422968,1.704647,1.771469,7617.005944
min,-179.9915,-81.2779,818500.32,19.74,0.0666,-81.2779,-179.607,-77.9823,144.3902,13.4916,-7.4025,127.2122,14.6297,-36.8426,-179.8389,-179.9953,-14.0726,143.935,16651.98
25%,-93.626475,-45.04535,821555.92625,22.87325,55.485125,-43.596075,-105.1514,-43.955475,144.894475,13.662525,-7.2942,134.808225,17.911025,-21.40695,-81.223075,-173.408775,-12.024275,147.57425,25261.439
50%,1.8192,-0.74215,826749.259,27.77,171.37055,0.74215,-11.5942,-0.3065,145.39775,13.83245,-7.18535,141.48805,20.23545,-1.45255,-64.3861,-166.92695,-10.7412,149.2385,32505.936
75%,86.6852,43.596075,832495.26625,32.676,235.24655,45.04535,74.956925,43.502525,145.901025,14.001175,-7.0761,148.23835,22.2725,18.959525,80.88275,173.3286,-9.32885,150.093,39100.43825
max,179.968,81.2779,835817.266,35.775,359.6796,81.2779,179.963,77.987,146.4033,14.1689,-6.9662,155.1864,24.8906,34.9171,179.8763,179.994,-6.9444,151.205,45017.367


In [8]:
# examine cardinality

df_source.nunique()

[DEGREES] LONGITUDE OF SATELLITE             6186
[DEGREES] LATITUDE OF SATELLITE              6180
[M] ALTITUDE OF SATELLITE                    6188
[DEGREES] ANGLE TO EARTH LIMB                5075
[DEGREES] NADIR RIGHT ASCENSION              6167
[DEGREES] NADIR DECLINATION                  6180
[DEGREES] NADIR LONGITUDE                    6176
[DEGREES] NADIR LATITUDE                     6175
[DEGREES] SOLAR RIGHT ASCENSION              6190
[DEGREES] SOLAR DECLINATION                  6169
[DEGREES] SOLAR ALTITUDE                        1
[DEGREES] SOLAR AZIMUTH                         1
[DEGREES] SOLAR LONGITUDE                       1
[DEGREES] SOLAR LATITUDE                     4358
[DEGREES] LUNAR RIGHT ASCENSION              6113
[DEGREES] LUNAR DECLINATION                  5979
[DEGREES] LUNAR ALTITUDE                     6162
[DEGREES] LUNAR AZIMUTH                      6164
[DEGREES] LUNAR LONGITUDE                    6106
[DEGREES] LUNAR LATITUDE                     5889


In [9]:
df_source.isna().sum()

[DEGREES] LONGITUDE OF SATELLITE             0
[DEGREES] LATITUDE OF SATELLITE              0
[M] ALTITUDE OF SATELLITE                    0
[DEGREES] ANGLE TO EARTH LIMB                0
[DEGREES] NADIR RIGHT ASCENSION              0
[DEGREES] NADIR DECLINATION                  0
[DEGREES] NADIR LONGITUDE                    0
[DEGREES] NADIR LATITUDE                     0
[DEGREES] SOLAR RIGHT ASCENSION              0
[DEGREES] SOLAR DECLINATION                  0
[DEGREES] SOLAR ALTITUDE                     0
[DEGREES] SOLAR AZIMUTH                      0
[DEGREES] SOLAR LONGITUDE                    0
[DEGREES] SOLAR LATITUDE                     0
[DEGREES] LUNAR RIGHT ASCENSION              0
[DEGREES] LUNAR DECLINATION                  0
[DEGREES] LUNAR ALTITUDE                     0
[DEGREES] LUNAR AZIMUTH                      0
[DEGREES] LUNAR LONGITUDE                    0
[DEGREES] LUNAR LATITUDE                     0
[DEGREES] LUNAR-TARGET ANGULAR SEPERATION    0
[nT] MAGNETIC

### Data Cleaning and Pre-Processing

In [10]:
# Identifier les variables dépendantes et indépendantes
df_target = df_source[TARGET_FEATURE]
df_predictors = df_source.drop([TARGET_FEATURE], axis=1)

In [11]:
# drop categorical features to simplify the process
df_predictors = df_predictors.drop(CATEGORICAL_FEATURES, axis=1)

In [12]:
from sklearn.model_selection import train_test_split

# Séparer le jeu de données en jeu de donnée d'entraînement et de test
X_train, X_test, y_train, y_test = train_test_split(
    df_predictors, 
    df_target,
    train_size=0.8, 
    test_size=0.2, 
    random_state=10,
)

In [13]:
# impute missing values
from sklearn.impute import SimpleImputer

imputer = SimpleImputer()
imputed_X_train = imputer.fit_transform(X_train)
imputed_X_test = imputer.transform(X_test)

In [14]:
# normalize data
from sklearn import preprocessing

scaler = preprocessing.StandardScaler()
scaled_imputed_X_train_plus = scaler.fit_transform(imputed_X_train)
scaled_imputed_X_test_plus = scaler.transform(imputed_X_test)

### Baseline Model training and validation

In [15]:
from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()
model.fit(scaled_imputed_X_train_plus, y_train)

In [16]:
preds = model.predict(scaled_imputed_X_test_plus)

In [17]:
# evaluate model
from sklearn.metrics import mean_absolute_error

mean_absolute_error(y_test, preds)

208.73079600161512

In [18]:
print("First five predictions: ", ", ".join(str(p) for p in preds[:5].tolist()))

First five predictions:  21480.796009999976, 27793.791519999988, 40499.44030999999, 44174.8597, 31212.520230000002
