# MarsToday: evaluating Mars climate through REMS sensor onboard Curiosity Mars rover - DATA EXTRACTION

A Mars rover is a motor vehicle designed to travel on the surface of Mars. Rovers have several advantages over stationary landers: they examine more territory, they can be directed to interesting features, they can place themselves in sunny positions to weather winter months, and they can advance the knowledge of how to perform very remote robotic vehicle control.
Curiosity landed in the crater Gale on planet Mars. The landing site coordinates are: 4.5895°S 137.4417°E. The location was named Bradbury Landing on 22 August 2012, in honor of science fiction author Ray Bradbury. Gale, an estimated 3.5 to 3.8 billion-year-old impact crater, is hypothesized to have first been gradually filled in by sediments; first water-deposited, and then wind-deposited, possibly until it was completely covered.

<img src="../images/Gale_crater.jpg" width="200"><img src="../images/Curiosity.jpg" width="250">

Curiosity has a lot of instruments onboard. One of them, REMS (Rover Environmental Monitoring Station) measures and provides daily and seasonal reports on atmospheric pressure, humidity, ultraviolet radiation at the Martian surface, air temperature, and ground temperature around the rover. REMS was develeoped in Spain by the Centro de Astrobología (CAB/CSIC-INTA) in collaboration with NASA and JPL-Caltech.

The data contained in this project represents the weather conditions on Mars from Sol 1 (August 7, 2012 on Earth) to Sol 1895 (February 27, 2018 on Earth). Sol is equivalent to 1 Martian day (1 Martian day = 24h 40 min).

However, REMS does not take measurements continuously and it takes measurements at different times from one day to another. For different reasons (instrument maintenance, instrument calibration, instrument degradation, etc.), some or all of the magnitudes in this project were not be available.

# Objective

The main questions I considered are the following:

- How is the weather on Mars?
- How it compares to its twin location on Earth?
- Is it possible to predict the weather with the missing data?
- BONUS: can I obtain pictures from Mars and complement the data?

# Data acquisition

The base data was extracted from Kaggle, "Mars weather data" by Kannan.K.R. Source: https://www.kaggle.com/datasets/imkrkannan/mars-weather-data

The data was complemented with weather data from Papua New Guinea (twin location of Curiosity on Earth) available on NOAA Global Surface Summary of the Day services, in the same range of dates as provided by Kannan.K.R. Source: https://www.ncei.noaa.gov/access/search/data-search/global-summary-of-the-day

Data prediction and current weather information were extracted using Selenium from the CAB-CSIC/INTA webpage. Source: http://cab.inta-csic.es/rems//

Let's code!

# Import box

In [1]:
# General Imports

from Functions import *

In [32]:
#Selenium imports

from ScrappingREMS import *

# Data from Mars

Data imported from Kaggle: https://www.kaggle.com/code/davidbnn92/weather-data

The cleaning steps are the following:

- Remove undesired columns: used "columns_to_remove" custom function
- Created a new column: "Mean_temp"
- Cleaned the atmosphere column: used "clean_atmosphere" custom function
- Renamed column names: used "rename_columns" custom function
- Cleaned the month column: used "clean_month" custom function
- Created a new column: "Season", by importing Month values <br></br>

## Data first visualization

In [46]:
mars = pd.read_csv("../data/mars-weather.csv")
mars.sample(5)

Unnamed: 0,id,terrestrial_date,sol,ls,month,min_temp,max_temp,pressure,wind_speed,atmo_opacity
462,1433,2016-11-07,1513,256,Month 9,-74.0,-1.0,907.0,,Sunny
227,1667,2017-07-08,1749,30,Month 2,-76.0,-21.0,867.0,,Sunny
568,1329,2016-07-21,1407,189,Month 7,-74.0,-9.0,779.0,,Sunny
440,1455,2016-11-30,1535,270,Month 10,-73.0,-1.0,900.0,,Sunny
1261,634,2014-07-17,691,162,Month 6,-76.0,-8.0,741.0,,Sunny


## Data cleaning

In [4]:
# Remove undesired columns

columns_to_remove = ["wind_speed", "id", "ls"]

for c in columns_to_remove:
    remove_columns(mars, c)

In [5]:
# Created a new column, average temperature

mars["Mean_temp"] = ((mars["min_temp"] + mars["max_temp"])/2)

In [6]:
# Cleaning the atmopshere column

clean_atmosphere(mars,"atmo_opacity","--","0")

Unnamed: 0,terrestrial_date,sol,month,min_temp,max_temp,pressure,atmo_opacity,Mean_temp
947,2015-06-16,1016,Month 12,-76.0,-11.0,847.0,Sunny,-43.5
379,2017-01-31,1596,Month 11,-74.0,-6.0,850.0,Sunny,-40.0


In [7]:
# Rename columns

oldname = ["terrestrial_date", "sol", "month", "min_temp", "max_temp", "pressure", "atmo_opacity"]
newname = ["Earth Date", "Sol", "Month", "Min_temp", "Max_temp", "Pressure", "Atmo_opacity"]

for o, n in zip(oldname, newname):
    rename_columns(mars, o, n)

In [8]:
# Cleaning the month column

clean_month(mars, "Month")

Unnamed: 0,Earth Date,Sol,Month,Min_temp,Max_temp,Pressure,Atmo_opacity,Mean_temp
185,2017-08-21,1792,2,-80.0,-27.0,883.0,Sunny,-53.5
143,2017-10-03,1834,3,-80.0,-25.0,875.0,Sunny,-52.5


In [9]:
#Create the Season column importing Month values

mars["Season"] = mars["Month"]

for i in range(len(mars["Season"])):

    if i in range(1,4):
        
        mars["Season"] = mars["Season"].replace(f"{i}","Winter")
    
    elif i in range(4,7):

        mars["Season"] = mars["Season"].replace(f"{i}","Spring")
    
    elif i in range(7,10):

        mars["Season"] = mars["Season"].replace(f"{i}","Summer")
    
    elif i in range(10,13):

        mars["Season"] = mars["Season"].replace(f"{i}","Autumn")

In [10]:
mars

Unnamed: 0,Earth Date,Sol,Month,Min_temp,Max_temp,Pressure,Atmo_opacity,Mean_temp,Season
0,2018-02-27,1977,5,-77.0,-10.0,727.0,Sunny,-43.5,Spring
1,2018-02-26,1976,5,-77.0,-10.0,728.0,Sunny,-43.5,Spring
2,2018-02-25,1975,5,-76.0,-16.0,729.0,Sunny,-46.0,Spring
3,2018-02-24,1974,5,-77.0,-13.0,729.0,Sunny,-45.0,Spring
4,2018-02-23,1973,5,-78.0,-18.0,730.0,Sunny,-48.0,Spring
...,...,...,...,...,...,...,...,...,...
1889,2012-08-18,12,6,-76.0,-18.0,741.0,Sunny,-47.0,Spring
1890,2012-08-17,11,6,-76.0,-11.0,740.0,Sunny,-43.5,Spring
1891,2012-08-16,10,6,-75.0,-16.0,739.0,Sunny,-45.5,Spring
1892,2012-08-15,9,6,,,,Sunny,,Spring


Data cleaned! :)

In [11]:
mars.to_csv("../data/mars-weather-cleaned.csv", index = False)

# Data from Earth

Extracted from NOAA database: https://www.ncei.noaa.gov/access/search/data-search/global-summary-of-the-day

The cleaning steps are the following:

- Remove undesired columns: used "columns_to_remove" custom function
- Renamed column names: used "rename_columns" custom function
- Converted temperatures values from Fahrenheit to Celsius: used "FtoC" custom function
- Converted from mBar to Pascals: used "mbartoPa" custom function
- Remove 99999s from Pressure column
- Rounded values of the mean_temp column

## Data first visualization

In [12]:
earth = pd.read_csv("../data/papua-weather.csv")

In [13]:
earth

Unnamed: 0,STATION,DATE,MAX,MIN,SLP,TEMP
0,92035099999,2012-08-07,84.2,71.6,1013.0,77.3
1,92035099999,2012-08-08,86.9,73.4,1011.9,78.6
2,92035099999,2012-08-09,84.2,73.4,1012.0,77.8
3,92035099999,2012-08-10,78.8,71.6,1012.9,74.3
4,92035099999,2012-08-11,78.8,71.6,1012.8,74.2
...,...,...,...,...,...,...
1980,92035099999,2018-02-23,91.4,77.0,9999.9,83.8
1981,92035099999,2018-02-24,91.4,77.0,9999.9,85.4
1982,92035099999,2018-02-25,93.2,75.2,1007.1,81.8
1983,92035099999,2018-02-26,91.4,77.0,1006.9,82.8


## Data cleaning

In [14]:
# Deleted Station column

remove_columns(earth, "STATION")

Unnamed: 0,DATE,MAX,MIN,SLP,TEMP
964,2015-04-21,87.8,75.2,1007.9,79.8
361,2013-08-17,87.8,73.4,1009.7,79.5


In [15]:
# Rename columns

rename_columns(earth, "DATE", "Earth Date")
rename_columns(earth, "MAX", "Max_temp")
rename_columns(earth, "MIN", "Min_temp")
rename_columns(earth, "TEMP", "Mean_temp")
rename_columns(earth, "SLP", "Pressure")

Unnamed: 0,Earth Date,Max_temp,Min_temp,Pressure,Mean_temp
1698,2017-05-15,89.6,77.0,1007.0,82.6
1490,2016-10-11,87.8,77.0,1008.5,81.2


In [16]:
# Converting values from Celsius to Fahrenheit

columns = ["Max_temp","Min_temp","Mean_temp"]
for i in columns:
    FtoC(earth, i)

In [17]:
# Converting values from mBar to Pascals

columns = ["Pressure"]
for i in columns:
    mbartoPa(earth,i)

In [18]:
# Sorting the date by descending dates

earth.sort_values(by=["Earth Date"], ascending = False, inplace=True)
earth.reset_index(drop = True)

Unnamed: 0,Earth Date,Max_temp,Min_temp,Pressure,Mean_temp
0,2018-02-27,33.0,24.0,100720.0,27.333333
1,2018-02-26,33.0,25.0,100690.0,28.222222
2,2018-02-25,34.0,24.0,100710.0,27.666667
3,2018-02-24,33.0,25.0,999990.0,29.666667
4,2018-02-23,33.0,25.0,999990.0,28.777778
...,...,...,...,...,...
1980,2012-08-11,26.0,22.0,101280.0,23.444444
1981,2012-08-10,26.0,22.0,101290.0,23.500000
1982,2012-08-09,29.0,23.0,101200.0,25.444444
1983,2012-08-08,30.5,23.0,101190.0,25.888889


In [19]:
# Remove 99999s from Pressure

earth.drop(earth.index[earth["Pressure"] == 999990.0], inplace = True)

In [20]:
# Round values from Mean_temp

roundval(earth, "Mean_temp", 1)
earth.reset_index(drop = True)

Unnamed: 0,Earth Date,Max_temp,Min_temp,Pressure,Mean_temp
0,2018-02-27,33.0,24.0,100720.0,27.3
1,2018-02-26,33.0,25.0,100690.0,28.2
2,2018-02-25,34.0,24.0,100710.0,27.7
3,2018-02-22,34.0,24.0,100690.0,27.5
4,2018-02-21,32.0,23.0,100570.0,27.2
...,...,...,...,...,...
1750,2012-08-11,26.0,22.0,101280.0,23.4
1751,2012-08-10,26.0,22.0,101290.0,23.5
1752,2012-08-09,29.0,23.0,101200.0,25.4
1753,2012-08-08,30.5,23.0,101190.0,25.9


Data cleaned :)!

In [21]:
#earth.to_csv("../data/papua-weather-cleaned.csv", index = False)

# Scraping REMS data

Both dataset had to be complemented with additional information taken directly from webpages.

This information is crucial for the last phase of the project, "Mars Today", in order to predict the weather of Mars beyond the limit date of the Kaggle dataframe.

The scrapping was performed on the REMS widget from the CAB-REMS webpage, http://cab.inta-csic.es/rems/es/ using Selenium.

In [None]:
#ScrappingREMS()

In [37]:
widget = pd.read_csv("../data/widget.csv")

## Scrapped data visualization

In [38]:
widget = widget.iloc[0:1548]
widget.sample(5)

Unnamed: 0,Earth Date,Sol,Month,Min_temp,Max_temp,Pressure,Atmo_opacity
1032,2019-10-07,2548,Mes 3,-81,-28,831,Soleado
309,2021-11-08,3291,Mes 5,-77,-12,735,Soleado
615,2020-12-15,2972,Mes 12,-69,-5,818,Soleado
1415,2018-07-15,2111,Mes 8,-67,-20,813,Soleado
661,2020-10-28,2925,Mes 11,-69,-2,844,Soleado


## Data cleaning

In [39]:
#Cleaning the month

clean_month(widget, "Month")

Unnamed: 0,Earth Date,Sol,Month,Min_temp,Max_temp,Pressure,Atmo_opacity
940,2020-01-09,2640,5,-75,-18,727,Soleado
324,2021-10-24,3276,4,-77,-19,752,Soleado


In [40]:
# Creating the Season column

widget["Season"] = widget["Month"]

for i in range(len(widget["Season"])):

    if i in range(1,4):
        
        widget["Season"] = widget["Season"].replace(f"{i}","Winter")
    
    elif i in range(4,7):

        widget["Season"] = widget["Season"].replace(f"{i}","Spring")
    
    elif i in range(7,10):

        widget["Season"] = widget["Season"].replace(f"{i}","Summer")
    
    elif i in range(10,13):

        widget["Season"] = widget["Season"].replace(f"{i}","Autumn")

In [41]:
# Clean the atmo column
clean_atmosphere(widget,"Atmo_opacity","Soleado","Sunny")

Unnamed: 0,Earth Date,Sol,Month,Min_temp,Max_temp,Pressure,Atmo_opacity,Season
1421,2018-07-08,2105,7,-61,-25,802,Sunny,Summer
1415,2018-07-15,2111,8,-67,-20,813,Sunny,Summer


In [42]:
# Remove weird values
widget.drop(widget.index[widget["Min_temp"] == "Valor no disponible"], inplace = True)

In [43]:
# Converting strings to floats
columns = ["Min_temp", "Max_temp", "Pressure"]
for i in columns:
    floatify(widget, i)

In [44]:
# Create the Mean column

widget["Mean_temp"] = ((widget["Min_temp"] + widget["Max_temp"])/2)

In [45]:
widget

Unnamed: 0,Earth Date,Sol,Month,Min_temp,Max_temp,Pressure,Atmo_opacity,Season,Mean_temp
0,2022-10-19,3627,1,-67.0,-9.0,803.0,Sunny,Winter,-38.0
1,2022-10-18,3626,1,-67.0,-12.0,804.0,Sunny,Winter,-39.5
2,2022-10-17,3625,1,-67.0,-11.0,804.0,Sunny,Winter,-39.0
3,2022-10-16,3624,1,-67.0,-12.0,806.0,Sunny,Winter,-39.5
4,2022-10-15,3623,1,-67.0,-12.0,808.0,Sunny,Winter,-39.5
...,...,...,...,...,...,...,...,...,...
1543,2018-03-05,1983,5,-76.0,-8.0,723.0,Sunny,Spring,-42.0
1544,2018-03-04,1982,5,-77.0,-7.0,724.0,Sunny,Spring,-42.0
1545,2018-03-03,1981,5,-75.0,-10.0,725.0,Sunny,Spring,-42.5
1546,2018-03-02,1980,5,-77.0,-11.0,725.0,Sunny,Spring,-44.0


Data cleaned!

In [31]:
widget.to_csv("../data/widget-cleaned.csv", index = False)