# Scraping Seismic Images on Virtual Seismic Atlas (VSA)


### Sharing Geosciences Data for a changing world.

![VSA](https://i.imgur.com/LuJXwiM.png)

## Introduction

### Problem Statement:

Datasets are essential for all scientists to conduct research, especially coders geoscientists. Research is an inquiry-based process that includes recognising a question, gathering data, analysing and evaluating results, drawing conclusions, and sharing the knowledge gained. The ability to conduct research mainly depends on the datasets. There are massive open-source data available online. However, it is often challenging for students and researchers to navigate the datasets, collect the data and download it. Because mainly data discoverability is poor, documentation is sometimes lacking, and licences can be confusing. I hope with these two projects to add toward the solution of these problems.

I this sense, I have conducted web scraping project:

##### VSA web scraping:  

I have web scraped the Virtual Seismic Atlas (VSA) https://www.seismicatlas.org/ to build a seismic images dataset used in machine learning analysis using a convolution neural network (CNN) to distinguish by faults and folds and discriminate between salt, sedimentary layers, noise and basement.  

For more information about the machine learning work, please see our abstract @ https://meetingorganizer.copernicus.org/EGU21/EGU21-6385.html and paper. 

### Example of Seismic line 

![Example of Seismic line.png](https://i.imgur.com/rcS9ra2.jpg)

### Seismic Interpretation

Seismic interpretation is the extraction of subsurface geologic information from seismic data. 

Reflection seismic data comprise:
1. Continuity of reflections indicating geologic structure.
2. Variability of reflections indicating stratigraphy, fluids and reservoir fabric.
3. The seismic wavelet.
4. Noise of various kinds and data defects. Seismic interpretation is the thoughtful procedure of separating these effects.

For more information see: [What Is Seismic Interpretation?](https://explorer.aapg.org/story/articleid/2471/what-is-seismic-interpretation#:~:text=Seismic%20Interpretation%20is%20the%20extraction%20of%20subsurface%20geologic%20information%20from%20seismic%20data.&text=The%20danger%20in%20seismic%20interpretation,of%20reflections%20indicating%20geologic%20structure.) by Satinder Chopra and Alistair R. Brown (May 2013).

### A comparison of raw vs interpreted seismic section:

Architecture and image characteristics of a normal fault - Inner Moray Firth Seismic Interpretation by Rob Butler.

![Seismic section](https://i.imgur.com/HvWkyei.jpg)

These interpretations display a correlation of units across the fault zone - which has synthetic splays and an offset relay structure at depth. The growth strata are chiefly of upper Jurassic in age - with the deepest identified unit par of the middle Jurassic (locally) pre-rift stratigraphy. The annotations identify some "typical" characteristics of faults imaged in 2D seismic data. [Rob Butler, 2015 (VSA)](https://www.seismicatlas.org/entity?id=e5ba8d3c-84e1-4ef7-a154-7788ed483e8f)

### Overview:

The web site https://www.seismicatlas.org/ provides a geological interpretation of seismic data and contain a massive number of sesismic sections. In this project, we will retrieve sesimic images to build a dataset with all the related information and seismic imgeas files from this page using web scraping: the process of extracting information from a website in an automated fashion using code. We will use the Python libraries `Requests` (https://pypi.org/project/requests/) and `Beautiful Soup` (https://www.crummy.com/software/BeautifulSoup/bs4/doc/)to scrape data from this page.

### VSA:

The VSA is a resource for sharing the geological interpretation of seismic data. By browsing freely through the site you will find seismic images and interpretations. And you can download higher resolution images for your own use - all without signing in. There are no membership fees, all we ask is that you respect the intellectual property rights of the contributors who have posted images on the VSA. Just hit the link "Search" to start!

Use the VSA to find images of subsurface structures, submarine landscapes, the structure of oceanic water-bodies and much more. You can search for analogues of subsurface geology, rapidly contrasting geological features, finding areas of controversy. You'll find results from cutting-edge technology and breaking research alongside historically important images of the subsurface.

The VSA is used by thousands of geoscientists each week. It's a great place to promote your own science, and for companies to showcase datasets and expertise. It's simple to author new content and link to your research webpages or related publications.

[Professor Rob Butler](https://www.abdn.ac.uk/people/rob.butler).

### What is web scrabing?
Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Whether you are a data scientist, engineer, or anybody who analyzes large amounts of datasets, the ability to scrape data from the web is a useful skill to have.

### Why is python used for scraping?
Automated web scraping can be a solution to speed up the data collection process. You write your code once and it will get the information you want many times and from many pages. Python is a popular and best programming language for web scraping. Python can handle multiple data crawling or web scraping tasks comfortably. `Requests` and `BeautifulSoup`, are the most famous and widely used Python frameworks.

### General project workflow:
1. Choose a website and describe the project objective
2. Create a list with all the seismic surveys URLs using the charactors from a to z.
3. Download the webpage using requests.
4. Parse the HTML source code using beautiful soup
5. Extract surveys names, information and URLs from page
6. Compile extracted information into Python lists and dictionaries
7. Extract and combine data from multiple survey pages
8. Save the extracted information to a CSV file.

### Expected results:
By the end of the project, we will create a CSV file in the folowing format:

Title,Data Type,Author,Date,Image url,Description,Info url
"Crustal Structure, West Orkney Basin, imaged by MOIST",Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/202103/ba8eb087-bbce-4efe-903f-f5b5fd69509f.jpg,"An interpretation of the first profile acquired by the BIRPS programme in 1981: MOIST. It images the crustal structure of the West Orkney Basin area, offshore N Scotland. Compare with interpretations of other BIRPS profiles that were acquired over the following years, with increasingly effective acquisition and processing strategies (WINCH-1, DRUM and GRID-17).",https://seismicatlas.org///entity?id=cad7a17f-d9c3-430e-83c2-7731247529ad
Deep structure - Outer Isles Fault - from WINCH 1 profile,Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/202103/9f048ab2-7295-41cd-8224-68a1527c2a5e.jpg,"An interpretation of BIRPS' WINCH-1 profile (from 1982) that images the crust crust around the Outer Isles Fault, offshore N Scotland - along with the enigmatic sub-Moho structure.",https://seismicatlas.org///entity?id=5daf55c7-7d75-47e9-9efc-ec65076b8139
"Deep Structure - West Orkney Basin, imaged on ultradeep seismic",Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/202103/c50021dd-96b9-4612-af7f-0d1b89d31fb2.jpg,"An interpretation of the BIRPS DRUM line (from 1984) - showing the crustal structure (rift basins developed across the Caledonian deformation front, offshore N Scotland) and the underlying, still enigmatic, sub-Moho reflective zones (Flannan and ""W"").",https://seismicatlas.org///entity?id=92b4fd80-b159-4e97-b77b-6cdc412b017a

![csv](https://i.imgur.com/PuYWt8O.png)

### Runing the code:
You can execute the code using the "Run" button at the top of this page. You can make changes and save your own version of the naotebook to [Jovian](https://www.jovian,ai) by executing the folowing cells. Then Run-on Binder, or Colab (Google's cloud infrastructure), or Run-on Kaggle.

In [78]:
!pip install jovian --upgrade --quiet

In [79]:
import jovian

In [80]:
# Execute this to save new versions of the notebook
jovian.commit(project="seismic-data-processing")

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "ramysaleem/seismic-data-processing" on https://jovian.ai[0m
[jovian] Uploading notebook..[0m
[jovian] Uploading additional files...[0m
[jovian] Committed successfully! https://jovian.ai/ramysaleem/seismic-data-processing[0m


'https://jovian.ai/ramysaleem/seismic-data-processing'

## Project Method:

### 1. VSA web page detailed  workflow

1. The first step of the project is to scrape the VSA Virtual Seismic Atlas https://www.seismicatlas.org/.

2. Second, we got a list of multiple geological regions that contain all the available seismic data where we selected the (faults) https://www.seismicatlas.org/search?f=1062&s, and (folds) https://www.seismicatlas.org/search?f=1039&s= which contain the seismic images data we are interested in. 

3. Third, we scraped the (Deformation Structures: Faults) website to collect 392 seismic data images.

4. Later, we got Title, Data Type, Author, Date, Image URL, Description and Info URL from the website.

5. The (VSA) website contain a search icon that opens the https://www.seismicatlas.org/search?f=&s= web page. This search has multiple filters which contain and hide all the seismic images.

6. We have used the programmer inspect tool to identify the base URL and attached numbers 1,2,3 to get all the seismic images for each region. After that, we iterate over them to create the 16 pages for the faults images, for example.

7. From the faults web pages https://www.seismicatlas.org/search?s=faults&pn=0, we start collecting the needed information.

8. We have created a function to get the section name using the `a' tag.

9. Another function was implemented to get the images information or label using the `span` tag and `class` "label".

10. Also, we have created a function that collects the name of the seismic section author and date using the `td` tag.
    
11. Moreover, we have collected the images URL using `img` tag and `class` "img-responsive" and get description information using the `p` tag and finally add the URL.

12. Then, we have created several functions that get all the seismic images information needed.
 
13. Additionally, we created several functions using the for loop to pass the pages URL list into the mentioned above functions to return a list.

14. Finally, we create a function that writes all the info to a CSV file, and later we open it using pandas library as a data frame.

### 2. Download the webpage using `requests`

We installed and imported the requests library to download the web page.
The library can be installed using `pip`.

In [98]:
!pip install requests --upgrade --quiet

In [4]:
import requests

In [5]:
url = 'https://seismicatlas.org/search'

To download a page, we can use the `get` function from requests, which returns a response object.

In [6]:
response = requests.get(url)

`requests.get` returns a response object containing the data from the web pae and some other information.

The `.status_code` property can be used to check if the request was successful. Asuccessful response will have [HTTP status code](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes) an between 200 and 299.

In [90]:
response.status_code

200

We can also save it to a file and view the page locally within Jupyter using "file>open".

In [91]:
with open('webpage.html', 'w') as f:
    f.write(response.text)

### 3. Used `Beautiful Soup` to parse the VSA web site and select the web site that contains the different structures

## [VSA](https://www.usgs.gov/science-explorer-results?es=3D+Seismic+data&classification=data)

We have scraped the VSA web main page to Exploring and get the structures web site that has the Seismic images data.

We installed and imported the `beautifulsoup4` library to parse the web page.
The library can be installed using `pip`.

In [97]:
!pip install beautifulsoup4 --upgrade --quiet

In [92]:
from bs4 import BeautifulSoup

Then we parsered the web page and save the out put into a doc, which is a beautiful soup object. 

In [93]:
doc = BeautifulSoup(response.text, 'html.parser')

#### 3.1 Image Sections:

Now let us collect the sections by getting the `div` tag with `class` "content".

In [94]:
content_div = doc.find('div', class_='content')

In [95]:
sections = content_div.find_all('section')

In [12]:
len(sections)

25

In [99]:
sections[0]

<section>
<figure class="thumbnail"><a href="/entity?id=6810547c-3de2-4ece-a840-0158ddbab8f3"><img alt="" class="img-responsive" src="/uploaded/image/200909/0d54ec4a-f3d8-4ffd-99eb-2ad79a66e38d.jpg"/></a></figure>
<h4>
<span><a href="/entity?id=6810547c-3de2-4ece-a840-0158ddbab8f3">FIRE 1 CMP 11 000-20 000</a></span>
<span class="label label-regional">Regional Project</span>
</h4>
<table class="summary">
<tbody><tr>
<th>VSA Author:</th>
<td></td>
</tr>
<tr>
<th>Scenes:</th>
<td>0</td>
</tr>
<tr>
<th>Interpretations:</th>
<td>1</td>
</tr>
<tr>
<th>Date Created:</th>
<td>2009-09-01</td>
</tr>
</tbody></table>
<p>The southern part of the FIRE profile 1 covers CMPs 11 000-20 000 (c. 275-500 km). The grey scale image is of a migrated section, published in Kukkonen et al. 2006 (see docs &amp; links for publication link).

</p>
</section>

Here we have created a function to get the sections that have all the images informations.

In [14]:
base_url = 'https://seismicatlas.org'

def parse_section(section):
    title = section.find_all('a')[1].text
    data_type = section.find('span', class_='label').text
    description = section.find('p').text.strip()
    url = base_url + section.find('a')['href']
    return {'title': title, 'type': data_type, 'description': description, 'url': url }

In [15]:
parse_section(sections[0])

{'title': 'FIRE 1 CMP 11 000-20 000',
 'type': 'Regional Project',
 'description': 'The southern part of the FIRE profile 1 covers CMPs 11 000-20 000 (c. 275-500 km). The grey scale image is of a migrated section, published in Kukkonen et al. 2006 (see docs & links for publication link).',
 'url': 'https://seismicatlas.org/entity?id=6810547c-3de2-4ece-a840-0158ddbab8f3'}

After that we have create a for loop to get all the sections in the page.

In [101]:
sections_data = [parse_section(section) for section in sections]

### 3.2 Sections dataframe using Pandsa

After we collected all the required information, we have used the `Pandas` library to create a data frame.

To install the library inside the notbook use `pip` and then `import` to import it as pd for short.

We have used `Pandas` to create dataframe from the collected data. Then we have import Pandas as pd for short.

In [104]:
!pip install Pandas --quiet --upgrade

In [105]:
import pandas as pd

In [106]:
sections_df = pd.DataFrame(sections_data)

In [107]:
sections_df

Unnamed: 0,title,type,description,url
0,FIRE 1 CMP 11 000-20 000,Regional Project,The southern part of the FIRE profile 1 covers...,https://seismicatlas.org/entity?id=6810547c-3d...
1,Iberseis profile,Regional Project,A deep seismic profile through part of souther...,https://seismicatlas.org/entity?id=0ba1b701-af...
2,"Crustal Structure, West Orkney Basin, imaged b...",Interpretation,An interpretation of the first profile acquire...,https://seismicatlas.org/entity?id=cad7a17f-d9...
3,Deep structure - Outer Isles Fault - from WINC...,Interpretation,An interpretation of BIRPS' WINCH-1 profile (f...,https://seismicatlas.org/entity?id=5daf55c7-7d...
4,"Deep Structure - West Orkney Basin, imaged on ...",Interpretation,An interpretation of the BIRPS DRUM line (from...,https://seismicatlas.org/entity?id=92b4fd80-b1...
5,Deep structure Western Orkney Basin - from GRI...,Interpretation,Interpretation of the GRID 17 profile (aka Syn...,https://seismicatlas.org/entity?id=d89526c0-b2...
6,Shallow subsurface MTCs,Regional Project,Variance extraction coloured by subsurface ele...,https://seismicatlas.org/entity?id=3117237a-35...
7,ESCIN - 2 profile,Regional Project,Deep seismic profile from northern Iberia.,https://seismicatlas.org/entity?id=a8895dfe-08...
8,Differential vertical movement with mobile salt,Interpretation,Annotated interpretation of stratigraphic rela...,https://seismicatlas.org/entity?id=4b096fa8-2d...
9,Salt mobility - imaging flaps,Interpretation,"An annotated interpretation, for discussion in...",https://seismicatlas.org/entity?id=ca5d068a-92...


### 3.3 VSA web pages CSV file

We have Created CSV file(s) with the extracted information to store the data

In [110]:
sections_df.to_csv('projects.csv', index=None)

In [111]:
!head projects.csv

title,type,description,url
FIRE 1 CMP 11 000-20 000,Regional Project,"The southern part of the FIRE profile 1 covers CMPs 11 000-20 000 (c. 275-500 km). The grey scale image is of a migrated section, published in Kukkonen et al. 2006 (see docs & links for publication link).",https://seismicatlas.org/entity?id=6810547c-3de2-4ece-a840-0158ddbab8f3
Iberseis profile,Regional Project,A deep seismic profile through part of southern Iberia.,https://seismicatlas.org/entity?id=0ba1b701-afd8-4511-be51-7bc88034a0d7
"Crustal Structure, West Orkney Basin, imaged by MOIST",Interpretation,"An interpretation of the first profile acquired by the BIRPS programme in 1981: MOIST. It images the crustal structure of the West Orkney Basin area, offshore N Scotland. Compare with interpretations of other BIRPS profiles that were acquired over the following years, with increasingly effective acquisition and processing strategies (WINCH-1, DRUM and GRID-17).",https://seismicatlas.org/entity?id=cad7a17f-d9c3-4

In [112]:
# Execute this to save new versions of the notebook

jovian.commit(project="seismic-data-processing")

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "ramysaleem/seismic-data-processing" on https://jovian.ai[0m
[jovian] Uploading notebook..[0m
[jovian] Uploading additional files...[0m
[jovian] Committed successfully! https://jovian.ai/ramysaleem/seismic-data-processing[0m


'https://jovian.ai/ramysaleem/seismic-data-processing'

# 4. Getting faults seismic interpretation images information

![faults](https://i.imgur.com/ZgwoXy0.jpg)

Let now get the info about faults interpretations by adding the "?f=1003&s&pn=" to the base URL.

In [114]:
fault_page_url = url + '?f=1003&s&pn='
fault_page_url

'https://seismicatlas.org/search?f=1003&s&pn='

The faults data in VSA have 392 data interpreted as faults. This data stored in 16 pages, numbered from 0 to 15. Now let us create a list of all the faults web pages.

In [115]:
faults_url = []

for i in range(0, 16):
    faults_url.append(fault_page_url + str(i))

print(faults_url)

['https://seismicatlas.org/search?f=1003&s&pn=0', 'https://seismicatlas.org/search?f=1003&s&pn=1', 'https://seismicatlas.org/search?f=1003&s&pn=2', 'https://seismicatlas.org/search?f=1003&s&pn=3', 'https://seismicatlas.org/search?f=1003&s&pn=4', 'https://seismicatlas.org/search?f=1003&s&pn=5', 'https://seismicatlas.org/search?f=1003&s&pn=6', 'https://seismicatlas.org/search?f=1003&s&pn=7', 'https://seismicatlas.org/search?f=1003&s&pn=8', 'https://seismicatlas.org/search?f=1003&s&pn=9', 'https://seismicatlas.org/search?f=1003&s&pn=10', 'https://seismicatlas.org/search?f=1003&s&pn=11', 'https://seismicatlas.org/search?f=1003&s&pn=12', 'https://seismicatlas.org/search?f=1003&s&pn=13', 'https://seismicatlas.org/search?f=1003&s&pn=14', 'https://seismicatlas.org/search?f=1003&s&pn=15']


In [116]:
faults_url

['https://seismicatlas.org/search?f=1003&s&pn=0',
 'https://seismicatlas.org/search?f=1003&s&pn=1',
 'https://seismicatlas.org/search?f=1003&s&pn=2',
 'https://seismicatlas.org/search?f=1003&s&pn=3',
 'https://seismicatlas.org/search?f=1003&s&pn=4',
 'https://seismicatlas.org/search?f=1003&s&pn=5',
 'https://seismicatlas.org/search?f=1003&s&pn=6',
 'https://seismicatlas.org/search?f=1003&s&pn=7',
 'https://seismicatlas.org/search?f=1003&s&pn=8',
 'https://seismicatlas.org/search?f=1003&s&pn=9',
 'https://seismicatlas.org/search?f=1003&s&pn=10',
 'https://seismicatlas.org/search?f=1003&s&pn=11',
 'https://seismicatlas.org/search?f=1003&s&pn=12',
 'https://seismicatlas.org/search?f=1003&s&pn=13',
 'https://seismicatlas.org/search?f=1003&s&pn=14',
 'https://seismicatlas.org/search?f=1003&s&pn=15']

### 4.1 Fults images information

![faults on seimsic interpretation](https://i.imgur.com/sAIwuw6.jpg)

Lets get the information from the collected faults images

In [122]:
faults_url_1 = faults_url[0]
faults_url_1

'https://seismicatlas.org/search?f=1003&s&pn=0'

In [123]:
f_responses_1 = requests.get(faults_url_1)
f_responses_1 

<Response [200]>

In [124]:
f_responses_1.text[:1000]

'<!DOCTYPE html>\n\n<html>\n<head>\n    <meta charset="utf-8" />\n    <title>VSA - Search</title>\n    <meta content="IE=edge" http-equiv="X-UA-Compatible" />\n    <meta content="width=device-width, initial-scale=1" name="viewport" />\n    \n    <link href="/resources/css/bootstrap.min.css" media="screen" rel="stylesheet" />\n\t<link href="/resources/css/bootstrap-theme.min.css" media="screen" rel="stylesheet" />\n\t<link href="/resources/css/theme.css" media="screen" rel="stylesheet" />\n\t    \n    <script src="../../../resources/js/modernizr.min.js"></script>\n    <!--[if lt IE 9]>\n    <script src="../../../resources/js/respond.min.js" th:src="@{/resources/js/respond.min.js}"></script>\n    <![endif]-->\n</head>\n<body>\n\t\n\t<header>\n\t\t<div class="container" role="header">\n\n\t\t\t<div class="row">\n\t\t\t\t<div class="col-sm-6">\n\t\t\t\t\t<div class="logo">\n\t\t\t\t\t\t<a class="nohover" href="/"><img alt="" height="50" width="143" src="/resources/css/vsa-logo.png" />\n\t\

In [125]:
len(f_responses_1.text)

75968

In [126]:
faults_url_data_1 = f_responses_1.text

In [127]:
with open('first_faults_data_page.html', 'w') as f:
    f.write(faults_url_data_1)

In [128]:
doc_faults_1 = BeautifulSoup(faults_url_data_1, 'html.parser')
doc_faults_1 

<!DOCTYPE html>

<html>
<head>
<meta charset="utf-8"/>
<title>VSA - Search</title>
<meta content="IE=edge" http-equiv="X-UA-Compatible"/>
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<link href="/resources/css/bootstrap.min.css" media="screen" rel="stylesheet"/>
<link href="/resources/css/bootstrap-theme.min.css" media="screen" rel="stylesheet"/>
<link href="/resources/css/theme.css" media="screen" rel="stylesheet"/>
<script src="../../../resources/js/modernizr.min.js"></script>
<!--[if lt IE 9]>
    <script src="../../../resources/js/respond.min.js" th:src="@{/resources/js/respond.min.js}"></script>
    <![endif]-->
</head>
<body>
<header>
<div class="container" role="header">
<div class="row">
<div class="col-sm-6">
<div class="logo">
<a class="nohover" href="/"><img alt="" height="50" src="/resources/css/vsa-logo.png" width="143"/>
<h3>Virtual Seismic Atlas</h3>
<p>Sharing the geological interpretation of seismic data</p>
</a>
</div>
</div>
<div class="col-s

In [129]:
faults_content_div = doc_faults_1.find('div', class_='content')
faults_content_div

<div class="col-xs-9 content">
<div class="facet-label">
<a class="btn btn-xxxs btn-danger" href="https://seismicatlas.org/search?pn=0&amp;f=&amp;s="><span class="glyphicon glyphicon-remove"></span></a>
<span>Deformation Structures:</span>
<a class="text" href="https://seismicatlas.org/search?pn=0&amp;f=&amp;s=" title="Click to Remove">Faults</a>
</div>
<small>Use the options on the left to narrow your search further, or to widen your search, you can remove previous selections from the list above.</small>
<br/>
<br/><br/>
<form action="" class="pagination-heading" method="get" role="form">
<input name="f" type="hidden" value="1003"/>
<input name="pn" type="hidden" value="0"/>
<div class="row">
<div class="col-md-3 col-sm-6 nowrap">
<div class="form-group">
<p class="form-control-static"><b>Results: 1-25 of 392</b></p>
</div>
</div>
<div class="col-md-3 col-sm-6 pull-right">
<div class="pull-right">
<div class="btn-group">
<button class="btn btn-default btn-sm dropdown-toggle" data-togg

In [130]:
faults_sections = faults_content_div.find_all('section')
faults_sections

[<section>
 <figure class="thumbnail"><a href="/entity?id=cad7a17f-d9c3-430e-83c2-7731247529ad"><img alt="" class="img-responsive" src="/uploaded/image/202103/ba8eb087-bbce-4efe-903f-f5b5fd69509f.jpg"/></a></figure>
 <h4>
 <span><a href="/entity?id=cad7a17f-d9c3-430e-83c2-7731247529ad">Crustal Structure, West Orkney Basin, imaged by MOIST</a></span>
 <span class="label label-interpretation">Interpretation</span>
 </h4>
 <table class="summary">
 <tbody><tr>
 <th>VSA Author:</th>
 <td>Rob Butler</td>
 </tr>
 <tr>
 <th>Date Created:</th>
 <td>2021-03-12</td>
 </tr>
 </tbody></table>
 <p>An interpretation of the first profile acquired by the BIRPS programme in 1981: MOIST. It images the crustal structure of the West Orkney Basin area, offshore N Scotland. Compare with interpretations of other BIRPS profiles that were acquired over the following years, with increasingly effective acquisition and processing strategies (WINCH-1, DRUM and GRID-17). </p>
 </section>,
 <section>
 <figure class="

In [131]:
faults_title_tags = faults_sections[0].find_all('a')[1].text
faults_title_tags

'Crustal Structure, West Orkney Basin, imaged by MOIST'

In [132]:
faults_label_tags = faults_sections[0].find_all('span', {'class' : "label"})[0].text
faults_label_tags

'Interpretation'

In [133]:
faults_authorname_tags = faults_sections[0].find_all('td')[0].text
faults_authorname_tags

'Rob Butler'

In [136]:
faults_date_tags = faults_sections[0].find_all('td')[-1].text
faults_date_tags

'2021-03-12'

In [138]:
faults_base_url = 'https://seismicatlas.org//'

faults_img_tags = faults_sections[0]("img", {'class' : "img-responsive"})[0]['src']
faults_img_url = faults_base_url + faults_img_tags
faults_img_url

'https://seismicatlas.org///uploaded/image/202103/ba8eb087-bbce-4efe-903f-f5b5fd69509f.jpg'

In [139]:
faults_desc_tags = faults_sections[0].find_all('p')[0].text
faults_desc_tags

'An interpretation of the first profile acquired by the BIRPS programme in 1981: MOIST. It images the crustal structure of the West Orkney Basin area, offshore N Scotland. Compare with interpretations of other BIRPS profiles that were acquired over the following years, with increasingly effective acquisition and processing strategies (WINCH-1, DRUM and GRID-17). '

In [145]:
faults_base_url = 'https://seismicatlas.org//'

faults_info_tags = faults_sections[0].find('a')['href']
faults_info_url = faults_base_url + faults_info_tags
faults_info_url

'https://seismicatlas.org///entity?id=cad7a17f-d9c3-430e-83c2-7731247529ad'

In [146]:
faults_base_url = 'https://seismicatlas.org//'

def parse_faults_section(section):
    
    faults_title = section.find_all('a')[1].text
    faults_data = section.find_all('span', {'class' : "label"})[0].text
    faults_authorname = section.find_all('td')[0].text
    faults_date = section.find_all('td')[-1].text
    faults_img_url = faults_base_url + section("img", {'class' : "img-responsive"})[0]['src']
    faults_description = section.find_all('p')[0].text.strip()
    faults_info_url = faults_base_url + section.find('a')['href']
    
    return {'Title': faults_title,
            'Data Type': faults_data,
            'Author': faults_authorname, 
            'Date': faults_date,
            'Image url': faults_img_url,
            'Description': faults_description, 
            'Info url': faults_info_url}

In [147]:
parse_faults_section(faults_sections[12])

{'Title': 'Faults and graben structure, offshore Tunisia',
 'Data Type': 'Interpretation',
 'Author': 'Rob Butler',
 'Date': '2021-02-06',
 'Image url': 'https://seismicatlas.org///uploaded/image/202102/704bc1b9-0922-4556-9e8f-92ce870076a3.jpg',
 'Description': 'This interpretation shows faults continuing to depth. The fault geometries on the right are obscured by migration and fault-shadowing artifacts. Note the scale - interpretation (less VE) version gives a more "realistic representation of basin structure.',
 'Info url': 'https://seismicatlas.org///entity?id=055c90aa-f84c-449a-8f25-ca6581235e7d'}

In [148]:
faults_sections_data = [parse_faults_section(section) for section in faults_sections]

In [149]:
faults_sections_data

[{'Title': 'Crustal Structure, West Orkney Basin, imaged by MOIST',
  'Data Type': 'Interpretation',
  'Author': 'Rob Butler',
  'Date': '2021-03-12',
  'Image url': 'https://seismicatlas.org///uploaded/image/202103/ba8eb087-bbce-4efe-903f-f5b5fd69509f.jpg',
  'Description': 'An interpretation of the first profile acquired by the BIRPS programme in 1981: MOIST. It images the crustal structure of the West Orkney Basin area, offshore N Scotland. Compare with interpretations of other BIRPS profiles that were acquired over the following years, with increasingly effective acquisition and processing strategies (WINCH-1, DRUM and GRID-17).',
  'Info url': 'https://seismicatlas.org///entity?id=cad7a17f-d9c3-430e-83c2-7731247529ad'},
 {'Title': 'Deep structure - Outer Isles Fault - from WINCH 1 profile',
  'Data Type': 'Interpretation',
  'Author': 'Rob Butler',
  'Date': '2021-03-12',
  'Image url': 'https://seismicatlas.org///uploaded/image/202103/9f048ab2-7295-41cd-8224-68a1527c2a5e.jpg',
  

In [150]:
# faults_sections_data = []
# for section in faults_sections:
#     faults_sections_data.append(parse_faults_section(section))
    
#     print(faults_sections_data)

In [151]:
faults_sections_df = pd.DataFrame(faults_sections_data)

In [152]:
faults_sections_df

Unnamed: 0,Title,Data Type,Author,Date,Image url,Description,Info url
0,"Crustal Structure, West Orkney Basin, imaged b...",Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/2021...,An interpretation of the first profile acquire...,https://seismicatlas.org///entity?id=cad7a17f-...
1,Deep structure - Outer Isles Fault - from WINC...,Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/2021...,An interpretation of BIRPS' WINCH-1 profile (f...,https://seismicatlas.org///entity?id=5daf55c7-...
2,"Deep Structure - West Orkney Basin, imaged on ...",Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/2021...,An interpretation of the BIRPS DRUM line (from...,https://seismicatlas.org///entity?id=92b4fd80-...
3,Deep structure Western Orkney Basin - from GRI...,Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/2021...,Interpretation of the GRID 17 profile (aka Syn...,https://seismicatlas.org///entity?id=d89526c0-...
4,"Inversion, North Sea",Interpretation,Rob Butler,2021-02-19,https://seismicatlas.org///uploaded/image/2021...,An interpretation of an inversion structure fr...,https://seismicatlas.org///entity?id=7998e3da-...
5,"Interpreting a deepwater thrust system, offsho...",Interpretation,Rob Butler,2021-02-18,https://seismicatlas.org///uploaded/image/2021...,An interpretation of a deepwater thrust belt. ...,https://seismicatlas.org///entity?id=ed086012-...
6,"Faults and graben (less VE version), offshore ...",Interpretation,Rob Butler,2021-02-06,https://seismicatlas.org///uploaded/image/2021...,A rather better representation of faults and g...,https://seismicatlas.org///entity?id=c56c95c3-...
7,Seismic stratigraphy exercise,Interpretation,Julien Moreau,2013-09-12,https://seismicatlas.org///uploaded/image/2013...,Correction of a seismic interpretation exercis...,https://seismicatlas.org///entity?id=ba4bfcdc-...
8,"Closely spaced faults - and imaging issues, of...",Interpretation,Rob Butler,2021-02-01,https://seismicatlas.org///uploaded/image/2021...,This and associated interpretation (annotated)...,https://seismicatlas.org///entity?id=6157758c-...
9,"A high resolution version of the thrust belt, ...",Scene,Rob Butler,2009-08-17,https://seismicatlas.org///uploaded/image/2009...,High quality seismic reflection data reveal th...,https://seismicatlas.org///entity?id=8f3d5439-...


In [153]:
def create_faults_url_docs(faults_url):
    
        # Download the url survey page
        response = requests.get(faults_url) 

        # Check successful response
        if response.status_code != 200:
            raise Exception('Failed to load page {}'.format(faults_url))

        # Parse using Beautiful soup
        faults_url_doc = BeautifulSoup(response.text, 'html.parser')
        
        return faults_url_doc

In [154]:
def get_faults_url_docs(faults_url_list):
    
    faults_url_docs = []
    for i in range(len(faults_url_list)):
        faults_url_docs.append(create_faults_url_docs(faults_url_list[i]))
    
    return faults_url_docs

In [155]:
faults_url_doc = get_faults_url_docs(faults_url)

In [156]:
len(faults_url_doc)

16

In [157]:
def create_faults_sections(f_url_doc):

    faults_content_div = f_url_doc.find('div', class_='content')
    faults_sections = faults_content_div.find_all('section')
    
    return faults_sections

In [158]:
faults_url_doc_0 = create_faults_sections(faults_url_doc[0])
faults_url_doc_1 = create_faults_sections(faults_url_doc[1])
faults_url_doc_2 = create_faults_sections(faults_url_doc[2])
faults_url_doc_3 = create_faults_sections(faults_url_doc[3])
faults_url_doc_4 = create_faults_sections(faults_url_doc[4])
faults_url_doc_5 = create_faults_sections(faults_url_doc[5])

In [159]:
len(faults_url_doc_3)

25

In [160]:
faults_sections = []
faults_sections.append(faults_url_doc_0)
faults_sections.append(faults_url_doc_1)

In [161]:
faults_sections

[[<section>
  <figure class="thumbnail"><a href="/entity?id=cad7a17f-d9c3-430e-83c2-7731247529ad"><img alt="" class="img-responsive" src="/uploaded/image/202103/ba8eb087-bbce-4efe-903f-f5b5fd69509f.jpg"/></a></figure>
  <h4>
  <span><a href="/entity?id=cad7a17f-d9c3-430e-83c2-7731247529ad">Crustal Structure, West Orkney Basin, imaged by MOIST</a></span>
  <span class="label label-interpretation">Interpretation</span>
  </h4>
  <table class="summary">
  <tbody><tr>
  <th>VSA Author:</th>
  <td>Rob Butler</td>
  </tr>
  <tr>
  <th>Date Created:</th>
  <td>2021-03-12</td>
  </tr>
  </tbody></table>
  <p>An interpretation of the first profile acquired by the BIRPS programme in 1981: MOIST. It images the crustal structure of the West Orkney Basin area, offshore N Scotland. Compare with interpretations of other BIRPS profiles that were acquired over the following years, with increasingly effective acquisition and processing strategies (WINCH-1, DRUM and GRID-17). </p>
  </section>,
  <sectio

In [162]:
faults_sections = []

for i in range(len(faults_url_doc)):
    faults_sections.append(create_faults_sections(faults_url_doc[i]))
    
    faults_sections

In [163]:
len(faults_sections)

16

In [61]:
create_faults_sections(faults_url_doc[15])

[<section>
 <figure class="thumbnail"><a href="/entity?id=0e1c2621-fdfb-4bf6-8e46-11c6d074c8ff"><img alt="" class="img-responsive" src="/uploaded/image/200802/13fd90b8-be40-4cbc-bbee-5bb82e666ea8.jpg"/></a></figure>
 <h4>
 <span><a href="/entity?id=0e1c2621-fdfb-4bf6-8e46-11c6d074c8ff">DW Niger delta fold - profile N34 - interpreted</a></span>
 <span class="label label-interpretation">Interpretation</span>
 </h4>
 <table class="summary">
 <tbody><tr>
 <th>VSA Author:</th>
 <td>Rob Butler</td>
 </tr>
 <tr>
 <th>Date Created:</th>
 <td>2008-02-11</td>
 </tr>
 </tbody></table>
 <p>Fore-thrust structure in the deep water western Niger delta. The simple thrust ramp at depth passes up-dip into a zone of distributed deformation that defines a trishear zone, contained as the fore-limb of the anticline.
                 
 
 </p>
 </section>,
 <section>
 <figure class="thumbnail"><a href="/entity?id=46332bc9-1654-4480-a0e4-8e255b5677f2"><img alt="" class="img-responsive" src="/uploaded/image/200

In [62]:
faults_url_divs = []

for i in range(len(faults_url_doc[10])):   
    
#     faults_url_sections.append(create_faults_sections(faults_url_doc[i]))
    
    faults_content_div = faults_url_doc[i].find_all('div', class_='content')
    #faults_sections = faults_content_div[.find_all('section')
    faults_url_divs.append(faults_content_div)
    
    faults_content_div

In [63]:
len(faults_content_div)

1

In [64]:
faults_url_sections = []

for i in range(len(faults_content_div)):   
    
#     faults_url_sections.append(create_faults_sections(faults_url_doc[i]))
    
    #faults_content_div = faults_url_doc[i].find_all('div', class_='content')
    faults_sections = faults_content_div[i].find('section')
    faults_url_sections.append(faults_sections)
    
    faults_url_sections

In [65]:
len(faults_url_sections)

1

In [66]:
faults_url_sections

[<section>
 <figure class="thumbnail"><a href="/entity?id=afeba2b7-c281-4493-9dab-42f4256b49ea"><img alt="" class="img-responsive" src="/uploaded/image/201610/4f1d5216-5dd0-4cc2-94d7-b7a05b29a461.jpg"/></a></figure>
 <h4>
 <span><a href="/entity?id=afeba2b7-c281-4493-9dab-42f4256b49ea">AES84-16</a></span>
 <span class="label label-regional">Regional Project</span>
 </h4>
 <table class="summary">
 <tbody><tr>
 <th>VSA Author:</th>
 <td>Taija Torvela</td>
 </tr>
 <tr>
 <th>Scenes:</th>
 <td>0</td>
 </tr>
 <tr>
 <th>Interpretations:</th>
 <td>0</td>
 </tr>
 <tr>
 <th>Date Created:</th>
 <td>2016-10-17</td>
 </tr>
 </tbody></table>
 <p>This image is used in the interpretation by Jimenez-Bonilla et al. (2017, Tectonics) of the internal structure and basement properties of the external Betics FTB from Sierras Subbeticas NW-wards to the Guadalquivir Basin. Please see the publication for interpretation and discussion. Location map and paper link in the docs and links section of this image.</p>

In [67]:
def get_faults_sections(faults_url_doc):
    
    faults_url_sections = []
    for i in range(len(faults_url_doc)):
        faults_url_sections.append(create_faults_sections(faults_url_doc[i]))
    
    return faults_url_sections

In [68]:
# faults_sections_data = []
# for section in faults_sections:
#     faults_sections_data.append(parse_faults_section(section))
    
#     print(faults_sections_data)

In [69]:
faults_sections_data_all = get_faults_sections(faults_url_doc)

In [70]:
len(faults_sections_data_all)

16

In [71]:
faults_sections_data_all[15][16]

<section>
<figure class="thumbnail"><a href="/entity?id=1725067e-04ff-4a5c-8020-acdd26b193f4"><img alt="" class="img-responsive" src="/uploaded/image/200802/5b27d519-5149-412c-8414-aa285f73fbca.jpg"/></a></figure>
<h4>
<span><a href="/entity?id=1725067e-04ff-4a5c-8020-acdd26b193f4">DW Nigeria Profile 1 (V=H) linework</a></span>
<span class="label label-interpretation">Interpretation</span>
</h4>
<table class="summary">
<tbody><tr>
<th>VSA Author:</th>
<td>Rob Butler</td>
</tr>
<tr>
<th>Date Created:</th>
<td>2008-02-04</td>
</tr>
</tbody></table>
<p></p>
</section>

In [72]:
flts_sections_data = []

for i in range(len(faults_sections_data_all)):
           for j in range(0, 25):
               flts_sections_data.append(parse_faults_section(faults_sections_data_all[i][j]))

           print(flts_sections_data)

[{'Title': 'Crustal Structure, West Orkney Basin, imaged by MOIST', 'Data Type': 'Interpretation', 'Author': 'Rob Butler', 'Date': '2021-03-12', 'Image url': 'https://seismicatlas.org///uploaded/image/202103/ba8eb087-bbce-4efe-903f-f5b5fd69509f.jpg', 'Description': 'An interpretation of the first profile acquired by the BIRPS programme in 1981: MOIST. It images the crustal structure of the West Orkney Basin area, offshore N Scotland. Compare with interpretations of other BIRPS profiles that were acquired over the following years, with increasingly effective acquisition and processing strategies (WINCH-1, DRUM and GRID-17).', 'Info url': 'https://seismicatlas.org///entity?id=cad7a17f-d9c3-430e-83c2-7731247529ad'}, {'Title': 'Deep structure - Outer Isles Fault - from WINCH 1 profile', 'Data Type': 'Interpretation', 'Author': 'Rob Butler', 'Date': '2021-03-12', 'Image url': 'https://seismicatlas.org///uploaded/image/202103/9f048ab2-7295-41cd-8224-68a1527c2a5e.jpg', 'Description': "An inte

IndexError: list index out of range

In [73]:
len(flts_sections_data)

392

In [74]:
faults_sections_all_df = pd.DataFrame(flts_sections_data)

In [75]:
faults_sections_all_df

Unnamed: 0,Title,Data Type,Author,Date,Image url,Description,Info url
0,"Crustal Structure, West Orkney Basin, imaged b...",Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/2021...,An interpretation of the first profile acquire...,https://seismicatlas.org///entity?id=cad7a17f-...
1,Deep structure - Outer Isles Fault - from WINC...,Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/2021...,An interpretation of BIRPS' WINCH-1 profile (f...,https://seismicatlas.org///entity?id=5daf55c7-...
2,"Deep Structure - West Orkney Basin, imaged on ...",Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/2021...,An interpretation of the BIRPS DRUM line (from...,https://seismicatlas.org///entity?id=92b4fd80-...
3,Deep structure Western Orkney Basin - from GRI...,Interpretation,Rob Butler,2021-03-12,https://seismicatlas.org///uploaded/image/2021...,Interpretation of the GRID 17 profile (aka Syn...,https://seismicatlas.org///entity?id=d89526c0-...
4,"Inversion, North Sea",Interpretation,Rob Butler,2021-02-19,https://seismicatlas.org///uploaded/image/2021...,An interpretation of an inversion structure fr...,https://seismicatlas.org///entity?id=7998e3da-...
...,...,...,...,...,...,...,...
387,DW Nigeria profile 1 - block colour,Interpretation,Rob Butler,2008-02-04,https://seismicatlas.org///uploaded/image/2008...,Interprted sectiion through the toe thrust sys...,https://seismicatlas.org///entity?id=bea7b151-...
388,BIRPS - DRUM -Flack & Warner 1990 interpretation,Interpretation,Estelle Mortimer,2008-02-05,https://seismicatlas.org///uploaded/image/2008...,In this interpretation (after Flack and Warner...,https://seismicatlas.org///entity?id=f3b921ab-...
389,BIRPS- DRUM -McGeary & Warner 1985 interpretation,Interpretation,Estelle Mortimer,2008-02-05,https://seismicatlas.org///uploaded/image/2008...,This interpreted line drawing showing the prin...,https://seismicatlas.org///entity?id=210f0fc9-...
390,Salt model Silverpit - interpreted,Interpretation,John Underhill,2008-02-04,https://seismicatlas.org///uploaded/image/2008...,In this interpretation of a seismic section th...,https://seismicatlas.org///entity?id=c7fc4157-...


In [76]:
faults_sections_df.to_csv('faults_images_data.csv', index=None)

In [77]:
# Execute this to save new versions of the notebook

jovian.commit(project="seismic-data-processing")

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "ramysaleem/seismic-data-processing" on https://jovian.ai[0m
[jovian] Uploading notebook..[0m
[jovian] Uploading additional files...[0m
[jovian] Committed successfully! https://jovian.ai/ramysaleem/seismic-data-processing[0m


'https://jovian.ai/ramysaleem/seismic-data-processing'