# Mission to mars

## Web Scraping

In this notebook the web scraping of four different web sites about mars is done.

The pages that were retreived are

1. Web page of Mars mission news (https://mars.nasa.gov/news/)

2. Featured image of JPL Mars Space Images (https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars)

3. Mars Weather (https://twitter.com/marswxreport?lang=en)

4. Mars facts (https://space-facts.com/mars/)

5. Mars hemispheres (https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars)

In [1]:
#Import dependencies
#Let's call the dependencies
import pandas as pd
from bs4 import BeautifulSoup
import requests
import time

In [2]:
#Dependencies to navigate with chrome driver
from splinter import Browser
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

In [3]:
#Dependencies to navigate with gecko driver
#Try with gecko driver
from selenium import webdriver
#Create the driver
driver = webdriver.Firefox()



## Nasa Mars News

In this section the latest news anout Mars mission program is going to be scraped.

In [4]:
# Retrieve page with the requests module
# URL of page to be scraped
url_news = 'https://mars.nasa.gov/news/?page=0&per_page=40&order=publish_date+desc%2Ccreated_at+desc&search=&category=19%2C165%2C184%2C204&blank_scope=Latest'

# Retrieve page with the requests module
response_news = requests.get(url_news)

# Create BeautifulSoup object; parse with 'lxml'
soup_news = BeautifulSoup(response_news.text, 'lxml')

#Extract the div tag with class slide
results_news = soup_news.find_all('div', class_='slide')

#Lets extract the title and the description paragraph, store it in a dictionaty and append the dict to a list
mars_news = []

for result in results_news:
    #Scrape for the title
    title = result.find("div", class_="content_title").a.get_text(strip=True)
    print(title)
    #Scrape for the paragraph text
    paragraph =result.find("div", class_="rollover_description_inner").get_text(strip=True)
    print(paragraph)
    print("--------")
    #Lets create a dictionary
    news={
        "news_title": title,
        "news_p": paragraph
    }
    #Lets append the dictionary
    mars_news.append(news)
    
#Let's extract the first dictionary values in the first item of the list
#Lets extract the first one
#Variable that stores the title of the first news
news_title = list(mars_news[0].values())[0]
#Variable that stores the paragraph text of the first news
news_p = list(mars_news[0].values())[1]

#Print them
print(f"Some mars news: {news_title} , {news_p}")


NASA to Broadcast Mars 2020 Perseverance Launch, Prelaunch Activities
Starting July 27, news activities will cover everything from mission engineering and science to returning samples from Mars to, of course, the launch itself.
--------
The Launch Is Approaching for NASA's Next Mars Rover, Perseverance
The Red Planet's surface has been visited by eight NASA spacecraft. The ninth will be the first that includes a roundtrip ticket in its flight plan.
--------
NASA to Hold Mars 2020 Perseverance Rover Launch Briefing
Learn more about the agency's next Red Planet mission during a live event on June 17.
--------
Alabama High School Student Names NASA's Mars Helicopter
Vaneeza Rupani's essay was chosen as the name for the small spacecraft, which will mark NASA's first attempt at powered flight on another planet.
--------
Mars Helicopter Attached to NASA's Perseverance Rover
The team also fueled the rover's sky crane to get ready for this summer's history-making launch.
--------
NASA's Persev

## JPL Mars Space Images - Featured image

In this section the web page of the NASA's Jet Propusion Laboratory is scraped to find the featured image.

In [5]:
#Lets do the splinter, declare executable path
from splinter import Browser
executable_path = {'executable_path': 'chromedriver.exe'}
browser = Browser('chrome', **executable_path, headless=False)

#Save the url
url_image = 'https://www.jpl.nasa.gov/spaceimages/?search=&category=Mars'
#Navigate  through chrome driver the website
browser.visit(url_image)

#Create a string with the Measing part of the the url
first_url ="https://www.jpl.nasa.gov"

#Create an empty list to store the url
images_url = []

#Iterate throug all the pages
for x in range(2):
    #Html object
    html= browser.html
    #Parse HTML  beautiful soup
    soup_image = BeautifulSoup(html, 'html.parser')
    #Retrieve all elements that contain image information
    results_image = soup_image.find_all('li', class_="slide")
    #Use beautiful soup's find method to navigate and retrieve attributes
    #title and URL
    for result in results_image:
        try:
            anchor = result.find('a', class_="fancybox")["data-fancybox-href"]
            c_url = f"{first_url}{anchor}"
            print(anchor)
            print("----")
            images_url.append(c_url)
        except:
            print("NO URL")
    
    #Click the next button
    try:
        browser.click_link_by_partial_text('MORE')
    except:
        print("Scraping complete")
        


        
        
#Extraxt the fiveth image
featured_image_url = images_url[0]

#PRINT IT
print(f"The final url is {featured_image_url}")

/spaceimages/images/largesize/PIA24156_hires.jpg
----
/spaceimages/images/largesize/PIA24155_hires.jpg
----
/spaceimages/images/largesize/PIA24099_hires.jpg
----
/spaceimages/images/largesize/PIA24098_hires.jpg
----
/spaceimages/images/largesize/PIA24152_hires.jpg
----
/spaceimages/images/largesize/PIA24149_hires.jpg
----
/spaceimages/images/largesize/PIA24148_hires.jpg
----
/spaceimages/images/largesize/PIA24147_hires.jpg
----
/spaceimages/images/largesize/PIA24146_hires.jpg
----
/spaceimages/images/largesize/PIA24151_hires.jpg
----
/spaceimages/images/largesize/PIA24145_hires.jpg
----
/spaceimages/images/largesize/PIA24144_hires.jpg
----
/spaceimages/images/largesize/PIA24143_hires.jpg
----
/spaceimages/images/largesize/PIA24142_hires.jpg
----
/spaceimages/images/largesize/PIA24141_hires.jpg
----
/spaceimages/images/largesize/PIA24125_hires.jpg
----
/spaceimages/images/largesize/PIA24091_hires.jpg
----
/spaceimages/images/largesize/PIA24040_hires.jpg
----
/spaceimages/images/largesiz



/spaceimages/images/largesize/PIA24156_hires.jpg
----
/spaceimages/images/largesize/PIA24155_hires.jpg
----
/spaceimages/images/largesize/PIA24099_hires.jpg
----
/spaceimages/images/largesize/PIA24098_hires.jpg
----
/spaceimages/images/largesize/PIA24152_hires.jpg
----
/spaceimages/images/largesize/PIA24149_hires.jpg
----
/spaceimages/images/largesize/PIA24148_hires.jpg
----
/spaceimages/images/largesize/PIA24147_hires.jpg
----
/spaceimages/images/largesize/PIA24146_hires.jpg
----
/spaceimages/images/largesize/PIA24151_hires.jpg
----
/spaceimages/images/largesize/PIA24145_hires.jpg
----
/spaceimages/images/largesize/PIA24144_hires.jpg
----
/spaceimages/images/largesize/PIA24143_hires.jpg
----
/spaceimages/images/largesize/PIA24142_hires.jpg
----
/spaceimages/images/largesize/PIA24141_hires.jpg
----
/spaceimages/images/largesize/PIA24125_hires.jpg
----
/spaceimages/images/largesize/PIA24091_hires.jpg
----
/spaceimages/images/largesize/PIA24040_hires.jpg
----
/spaceimages/images/largesiz

## Mars weather
In this section it is scraped the Mars Weather twitter account in order to find the latest Mars weather tweet from the page. We save the tweet text for the weather report as a variable called mars_weather.

In [6]:
#Stablish the twitter url
url_weather ='https://twitter.com/marswxreport?lang=en'
#Navigate throug the web page with gecko driver
driver.get(url_weather)
#find the weathet twit by class name
time.sleep(2)
content = driver.find_element_by_class_name('css-1dbjc4n')
time.sleep(2)


In [7]:
#extract the position of the word "InSight" from the last query
start = content.text.find("InSight")
#extract the position of the word "hPa" from the last query
end = content.text.find("hPa")
#Convert to text the query
result_weather = content.text
#Save the result in mars weather
mars_weather=result_weather[start:end+3]

In [8]:
#Lets print the final result
print(mars_weather)

InSight sol 676 (2020-10-21) low -96.9ºC (-142.4ºF) high -16.5ºC (2.3ºF)
winds from the W at 8.9 m/s (19.8 mph) gusting to 26.9 m/s (60.2 mph)
pressure at 7.50 hPa


## Mars Facts

In this section a table that contains some facts about Mars is done. This table is converted into an html table with pandas.

The web page is:

In [9]:
#Lets get the url
url_mt = 'https://space-facts.com/mars/'
#Make pandas read the tables in the web page
tables = pd.read_html(url_mt)
#Add the column names
df = tables[0]
df.columns = ['Mars fact', 'Value']
print(df)
#Convert into a html table
html_table = df.to_html()
html_table

              Mars fact                          Value
0  Equatorial Diameter:                       6,792 km
1       Polar Diameter:                       6,752 km
2                 Mass:  6.39 × 10^23 kg (0.11 Earths)
3                Moons:            2 (Phobos & Deimos)
4       Orbit Distance:       227,943,824 km (1.38 AU)
5         Orbit Period:           687 days (1.9 years)
6  Surface Temperature:                   -87 to -5 °C
7         First Record:              2nd millennium BC
8          Recorded By:           Egyptian astronomers


'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>Mars fact</th>\n      <th>Value</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Equatorial Diameter:</td>\n      <td>6,792 km</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>Polar Diameter:</td>\n      <td>6,752 km</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>Mass:</td>\n      <td>6.39 × 10^23 kg (0.11 Earths)</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>Moons:</td>\n      <td>2 (Phobos &amp; Deimos)</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>Orbit Distance:</td>\n      <td>227,943,824 km (1.38 AU)</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>Orbit Period:</td>\n      <td>687 days (1.9 years)</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>Surface Temperature:</td>\n      <td>-87 to -5 °C</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>First Record:</td>\n      <td>2nd millennium BC

# Mars hemispheres

In this section it is done the web scraping of the USGS Atrogeology webpage. This is done in order to find high resolution images for each of Mar's hemispheres.

In [10]:
# URL of page to be scraped
url_hem = 'https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars'

# Retrieve page with the requests module
response_hem = requests.get(url_hem)

# Create BeautifulSoup object; parse with 'lxml'
soup_hem = BeautifulSoup(response_hem.text, 'lxml')

#Lets extrac the div tag with class item 
results_hem = soup_hem.find_all('a', class_="itemLink product-item")

principal_url = []
first_hem = "https://astrogeology.usgs.gov"
#Lets extract the url
for result in results_hem:
    anchor_hem = result["href"]
    hem_url = f"{first_hem}{anchor_hem}"
    print(anchor)
    print(hem_url)
    print("----")
    principal_url.append(hem_url)
    

#lets do the splinter inside a for loop
#Create an empty list

hem_list=[]

for addres in principal_url:
    aux_url = addres
    print(aux_url)
    # Retrieve page with the requests module
    aux_response = requests.get(aux_url)
    # Create BeautifulSoup object; parse with 'lxml'
    aux_soup = BeautifulSoup(aux_response.text, 'lxml')
    #Lets extract the title
    header = aux_soup.find_all('h2', class_="title")
    title = header[0].text
    #Lets extract the url
    downloads = aux_soup.find_all('div',class_= 'downloads')
    link = downloads[0].a['href']
    #Create a dictionary
    hem_dict = {"title": title, "img_url":link}
    #Add to the list
    hem_list.append(hem_dict)
    
print(hem_list)
    


/spaceimages/images/largesize/PIA24112_hires.jpg
https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced
----
/spaceimages/images/largesize/PIA24112_hires.jpg
https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced
----
/spaceimages/images/largesize/PIA24112_hires.jpg
https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced
----
/spaceimages/images/largesize/PIA24112_hires.jpg
https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced
----
https://astrogeology.usgs.gov/search/map/Mars/Viking/cerberus_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/schiaparelli_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/syrtis_major_enhanced
https://astrogeology.usgs.gov/search/map/Mars/Viking/valles_marineris_enhanced
[{'title': 'Cerberus Hemisphere Enhanced', 'img_url': 'https://astropedia.astrogeology.usgs.gov/download/Mars/Viking/cerberus_enhanced.tif/full.jpg'}, {'title': 'Schiaparelli Hemispher