![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)

# Lab | Web Scraping Single Page

#### Business goal:

- Check the `case_study_gnod.md` file.
- Make sure you've understood the big picture of your project:

  - the goal of the company (`Gnod`),
  - their current product (`Gnoosic`),
  - their strategy, and
  - how your project fits into this context.

  Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

#### Instructions - Scraping popular songs

Your product will take a song as an input from the user and will output another song (the recommendation). In most cases, the recommended song will have to be similar to the inputted song, but the CTO thinks that if the song is on the top charts at the moment, the user will enjoy more a recommendation of a song that's also popular at the moment.

You have find data on the internet about currently popular songs. Billboard maintains a weekly Top 100 of "hot" songs here: [https://www.billboard.com/charts/hot-100](https://www.billboard.com/charts/hot-100).

It's a good place to start! Scrape the current top 100 songs and their respective artists, and put the information into a pandas dataframe.



In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
url = 'https://www.billboard.com/charts/hot-100'

In [3]:
response = requests.get(url)

In [4]:

response

<Response [200]>

In [5]:
soup = BeautifulSoup(response.content, 'html.parser')

In [6]:
soup

<!DOCTYPE html>

<!--[if IE 6]>
<html id="ie6" lang="en-US">
<![endif]-->
<!--[if IE 7]>
<html id="ie7" lang="en-US">
<![endif]-->
<!--[if IE 8]>
<html id="ie8" lang="en-US">
<![endif]-->
<!--[if !(IE 6) | !(IE 7) | !(IE 8) ]><!-->
<html lang="en-US">
<!--<![endif]-->
<head>
<meta charset="utf-8"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="#ffffff" name="theme-color"/>
<meta content="width=device-width, initial-scale=1.0" name="viewport">
<!-- Add to home screen for iOS -->
<meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>
<link href="https://www.billboard.com/wp-content/themes/vip/pmc-billboard-2021/assets/app/icons/apple-touch-icon.png" rel="apple-touch-icon" sizes="180x180"/>
<!-- Tile icons for Windows -->
<meta content="https://www.billboard.com/wp-content/themes/vip/pmc-billboard-2021/assets/app/browserconfig.xml" name="msapplication-config"/>
<meta content="https://www.billboard.com/wp-content/themes/vip/pmc-bil

In [7]:
soup.select('li h3')[0].get_text().strip()

'Last Night'

In [8]:
Top_100_Songs = []
for i in range(100):
    song = soup.select('li h3')[i].get_text().strip()
    Top_100_Songs.append(song)

In [9]:
Top_100_Songs

['Last Night',
 'Kill Bill',
 'Flowers',
 'Ella Baila Sola',
 'Un x100to',
 'Calm Down',
 "Creepin'",
 'Die For You',
 "Boy's A Liar, Pt. 2",
 'Anti-Hero',
 'La Bebe',
 'Search & Rescue',
 'Favorite Song',
 'Fast Car',
 'Sure Thing',
 'Players',
 'Rock And A Hard Place',
 'Double Fantasy',
 'You Proof',
 'As It Was',
 'Chemical',
 'One Thing At A Time',
 "Thinkin' Bout Me",
 'Thought You Should Know',
 "I'm Good (Blue)",
 'Something In The Orange',
 'Thank God',
 'Por Las Noches',
 'Princess Diana',
 'Escapism',
 'Under The Influence',
 "Dancin' In The Country",
 'PRC',
 'Slut Me Out',
 'Lavender Haze',
 'Handle On You',
 'Unholy',
 'Snooze',
 'Just Wanna Rock',
 'AMG',
 'Cupid',
 'Superhero (Heroes & Villains)',
 'TQG',
 'Wait In The Truck',
 'Eyes Closed',
 'Spin Bout U',
 'Daylight',
 'Rich Flex',
 'Love You Anyway',
 'Bebe Dame',
 'Wild As Her',
 'See You Again',
 'Tennessee Orange',
 'Next Thing You Know',
 'Alone',
 'Chanel',
 'El Azul',
 'Haegeum',
 'Low Down',
 'Need A Favor',


In [10]:
soup.select('span.chart-element__information span')

[]

In [11]:
soup.select('chart-element__information__artist')

[]

In [12]:
soup.select(".c-label.a-no-trucate")[0].get_text().strip()

'Morgan Wallen'

In [13]:
#for i in range(50):
    #print(i, soup.select("li span.")[i])

In [14]:
soup.select("li span")[16].get_text().strip()

'Morgan Wallen'

In [15]:
Top_100_Artist = []
for i in range(100):
    artist = soup.select(".c-label.a-no-trucate")[i].get_text().strip()
    Top_100_Artist.append(artist)

In [16]:
Top_100_Artist

['Morgan Wallen',
 'SZA',
 'Miley Cyrus',
 'Eslabon Armado X Peso Pluma',
 'Grupo Frontera X Bad Bunny',
 'Rema & Selena Gomez',
 'Metro Boomin, The Weeknd & 21 Savage',
 'The Weeknd & Ariana Grande',
 'PinkPantheress & Ice Spice',
 'Taylor Swift',
 'Yng Lvcas x Peso Pluma',
 'Drake',
 'Toosii',
 'Luke Combs',
 'Miguel',
 'Coi Leray',
 'Bailey Zimmerman',
 'The Weeknd Featuring Future',
 'Morgan Wallen',
 'Harry Styles',
 'Post Malone',
 'Morgan Wallen',
 'Morgan Wallen',
 'Morgan Wallen',
 'David Guetta & Bebe Rexha',
 'Zach Bryan',
 'Kane Brown With Katelyn Brown',
 'Peso Pluma',
 'Ice Spice & Nicki Minaj',
 'RAYE Featuring 070 Shake',
 'Chris Brown',
 'Tyler Hubbard',
 'Peso Pluma X Natanael Cano',
 'NLE Choppa',
 'Taylor Swift',
 'Parker McCollum',
 'Sam Smith & Kim Petras',
 'SZA',
 'Lil Uzi Vert',
 'Gabito Ballesteros, Peso Pluma & Natanael Cano',
 'Fifty Fifty',
 'Metro Boomin, Future & Chris Brown',
 'Karol G x Shakira',
 'HARDY Featuring Lainey Wilson',
 'Ed Sheeran',
 'Drake 

In [17]:
Top_100_Songs

['Last Night',
 'Kill Bill',
 'Flowers',
 'Ella Baila Sola',
 'Un x100to',
 'Calm Down',
 "Creepin'",
 'Die For You',
 "Boy's A Liar, Pt. 2",
 'Anti-Hero',
 'La Bebe',
 'Search & Rescue',
 'Favorite Song',
 'Fast Car',
 'Sure Thing',
 'Players',
 'Rock And A Hard Place',
 'Double Fantasy',
 'You Proof',
 'As It Was',
 'Chemical',
 'One Thing At A Time',
 "Thinkin' Bout Me",
 'Thought You Should Know',
 "I'm Good (Blue)",
 'Something In The Orange',
 'Thank God',
 'Por Las Noches',
 'Princess Diana',
 'Escapism',
 'Under The Influence',
 "Dancin' In The Country",
 'PRC',
 'Slut Me Out',
 'Lavender Haze',
 'Handle On You',
 'Unholy',
 'Snooze',
 'Just Wanna Rock',
 'AMG',
 'Cupid',
 'Superhero (Heroes & Villains)',
 'TQG',
 'Wait In The Truck',
 'Eyes Closed',
 'Spin Bout U',
 'Daylight',
 'Rich Flex',
 'Love You Anyway',
 'Bebe Dame',
 'Wild As Her',
 'See You Again',
 'Tennessee Orange',
 'Next Thing You Know',
 'Alone',
 'Chanel',
 'El Azul',
 'Haegeum',
 'Low Down',
 'Need A Favor',


In [18]:
data = {'Song': Top_100_Songs, 'Artist': Top_100_Artist}

In [19]:
data

{'Song': ['Last Night',
  'Kill Bill',
  'Flowers',
  'Ella Baila Sola',
  'Un x100to',
  'Calm Down',
  "Creepin'",
  'Die For You',
  "Boy's A Liar, Pt. 2",
  'Anti-Hero',
  'La Bebe',
  'Search & Rescue',
  'Favorite Song',
  'Fast Car',
  'Sure Thing',
  'Players',
  'Rock And A Hard Place',
  'Double Fantasy',
  'You Proof',
  'As It Was',
  'Chemical',
  'One Thing At A Time',
  "Thinkin' Bout Me",
  'Thought You Should Know',
  "I'm Good (Blue)",
  'Something In The Orange',
  'Thank God',
  'Por Las Noches',
  'Princess Diana',
  'Escapism',
  'Under The Influence',
  "Dancin' In The Country",
  'PRC',
  'Slut Me Out',
  'Lavender Haze',
  'Handle On You',
  'Unholy',
  'Snooze',
  'Just Wanna Rock',
  'AMG',
  'Cupid',
  'Superhero (Heroes & Villains)',
  'TQG',
  'Wait In The Truck',
  'Eyes Closed',
  'Spin Bout U',
  'Daylight',
  'Rich Flex',
  'Love You Anyway',
  'Bebe Dame',
  'Wild As Her',
  'See You Again',
  'Tennessee Orange',
  'Next Thing You Know',
  'Alone',
  

In [20]:
df = pd.DataFrame(data)

In [21]:
df

Unnamed: 0,Song,Artist
0,Last Night,Morgan Wallen
1,Kill Bill,SZA
2,Flowers,Miley Cyrus
3,Ella Baila Sola,Eslabon Armado X Peso Pluma
4,Un x100to,Grupo Frontera X Bad Bunny
...,...,...
95,It Matters To Her,Scotty McCreery
96,Like Crazy,Jimin
97,All Of The Girls You Loved Before,Taylor Swift
98,5 Leaf Clover,Luke Combs


In [22]:
df[df['Artist'].str.contains('Harry')]

Unnamed: 0,Song,Artist
19,As It Was,Harry Styles


In [23]:
df[df['Artist'].str.contains('Bunny')]

Unnamed: 0,Song,Artist
4,Un x100to,Grupo Frontera X Bad Bunny


In [24]:
df[df['Artist'].str.contains('Peso')]

Unnamed: 0,Song,Artist
3,Ella Baila Sola,Eslabon Armado X Peso Pluma
10,La Bebe,Yng Lvcas x Peso Pluma
27,Por Las Noches,Peso Pluma
32,PRC,Peso Pluma X Natanael Cano
39,AMG,"Gabito Ballesteros, Peso Pluma & Natanael Cano"
55,Chanel,Becky G & Peso Pluma
56,El Azul,Junior H x Peso Pluma
79,Igualito A Mi Apa,Fuerza Regida & Peso Pluma


In [25]:
df[df['Artist'].str.contains('Morgan')]

Unnamed: 0,Song,Artist
0,Last Night,Morgan Wallen
18,You Proof,Morgan Wallen
21,One Thing At A Time,Morgan Wallen
22,Thinkin' Bout Me,Morgan Wallen
23,Thought You Should Know,Morgan Wallen
63,Ain't That Some,Morgan Wallen
68,Cowgirls,Morgan Wallen Featuring ERNEST
69,I Wrote The Book,Morgan Wallen
72,Man Made A Bar,Morgan Wallen Featuring Eric Church
73,Everything I Love,Morgan Wallen


In [26]:
pd.set_option('display.max_rows', 100)
df

Unnamed: 0,Song,Artist
0,Last Night,Morgan Wallen
1,Kill Bill,SZA
2,Flowers,Miley Cyrus
3,Ella Baila Sola,Eslabon Armado X Peso Pluma
4,Un x100to,Grupo Frontera X Bad Bunny
5,Calm Down,Rema & Selena Gomez
6,Creepin',"Metro Boomin, The Weeknd & 21 Savage"
7,Die For You,The Weeknd & Ariana Grande
8,"Boy's A Liar, Pt. 2",PinkPantheress & Ice Spice
9,Anti-Hero,Taylor Swift


![logo_ironhack_blue 7](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)

# Lab | Web Scraping Multiple Pages

#### Business goal:

- Check the `case_study_gnod.md` file.
- Make sure you've understood the big picture of your project:

  - the goal of the company (`Gnod`),
  - their current product (`Gnoosic`),
  - their strategy, and
  - how your project fits into this context.

  Re-read the business case and the e-mail from the CTO, take a look at the flowchart and create an initial Trello board with the tasks you think you'll have to accomplish.

#### Instructions 

#### Prioritize the MVP

In the previous lab, you had to scrape data about "hot songs". It's critical to be on track with that part, as it was part of the request from the CTO.

If you couldn't finish the first lab, use this time to go back there.

#### Expand the project

If you're done, you can try to expand the project on your own. Here are a few suggestions:

- Find other lists of hot songs on the internet and scrape them too: having a bigger pool of songs will be awesome!
- Apply the same logic to other "groups" of songs: the best songs from a decade or from a country / culture / language / genre.
- Wikipedia maintains a large collection of lists of songs: https://en.wikipedia.org/wiki/Lists_of_songs

#### Practice web scraping

As you've seen, scraping the internet is a skill that can get you all sorts of information. Here are some little challenges that you can try to gain more experience in the field:

- Retrieve an arbitrary Wikipedia page of "Python" and create a list of links on that page: `url ='https://en.wikipedia.org/wiki/Python'`
- Find the number of titles that have changed in the United States Code since its last release point: `url = 'http://uscode.house.gov/download/download.shtml'`
- Create a Python list with the top ten FBI's Most Wanted names: `url = 'https://www.fbi.gov/wanted/topten'`
- Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe: `url = 'https://www.emsc-csem.org/Earthquake/'`
- List all language names and number of related articles in the order they appear in [wikipedia.org](wikipedia.org): `url = 'https://www.wikipedia.org/'`
- A list with the different kind of datasets available in [data.gov.uk](data.gov.uk): `url = 'https://data.gov.uk/'`
- Display the top 10 languages by number of native speakers stored in a pandas dataframe: `url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'`



### Agregaremos mas canciones: en este caso las top 100 canciones en mexico en este momento

In [27]:
url = 'https://kworb.net/spotify/country/mx_daily.html'

In [28]:
response = requests.get(url)

In [29]:
soup = BeautifulSoup(response.content, 'html.parser')

In [30]:
soup.select('body > div.container > div.subcontainer > table.sortable')

[<table class="sortable" id="spotifydaily">
 <colgroup><col class="col-pos"/><col class="col-pos"/><col class="col-title"/><col class="col-period"/><col class="col-peak"/><col class="col-peak2"/><col class="col-streams"/><col class="col-streams"/><col class="col-streams"/><col class="col-streams"/><col class="col-total"/></colgroup>
 <thead><tr><th class="np">Pos</th><th class="np">P+</th><th class="mp text col-title">Artist and Title</th><th>Days</th><th>Pk</th><th class="mini text">(x?)</th><th>Streams</th><th>Streams+</th><th>7Day</th><th>7Day+</th><th>Total</th></tr></thead>
 <tbody>
 <tr><td class="np">1</td>
 <td class="np">=</td>
 <td class="text mp"><div><a href="../artist/6XkjpgcEsYab502Vr1bBeW.html">Grupo Frontera</a> - <a href="../track/6pD0ufEQq0xdHSsRbg9LBK.html">un x100to</a> (w/ <a href="../artist/4q3ewBCX7sLwd24euuV69X.html">Bad Bunny</a>)</div></td>
 <td>17</td>
 <td>1</td><td class="np mini text">(x6)</td>
 <td>2,921,431</td>
 <td>+44,604</td>
 <td class="smaller">22,

In [31]:
soup.select('tbody > tr')

[<tr><td class="np">1</td>
 <td class="np">=</td>
 <td class="text mp"><div><a href="../artist/6XkjpgcEsYab502Vr1bBeW.html">Grupo Frontera</a> - <a href="../track/6pD0ufEQq0xdHSsRbg9LBK.html">un x100to</a> (w/ <a href="../artist/4q3ewBCX7sLwd24euuV69X.html">Bad Bunny</a>)</div></td>
 <td>17</td>
 <td>1</td><td class="np mini text">(x6)</td>
 <td>2,921,431</td>
 <td>+44,604</td>
 <td class="smaller">22,634,030</td>
 <td class="smaller">+119,779</td>
 <td>58,123,369</td></tr>,
 <tr><td class="np">2</td>
 <td class="np">=</td>
 <td class="text mp"><div><a href="../artist/0XeEobZplHxzM9QzFQWLiR.html">Eslabon Armado</a> - <a href="../track/3dnP0JxCgygwQH9Gm7q7nb.html">Ella Baila Sola</a> (w/ <a href="../artist/12GqGscKJx3aE4t07u7eVZ.html">Peso Pluma</a>)</div></td>
 <td>48</td>
 <td>1</td><td class="np mini text">(x41)</td>
 <td>2,709,895</td>
 <td>-42,631</td>
 <td class="smaller">23,216,467</td>
 <td class="smaller">-707,698</td>
 <td>149,218,297</td></tr>,
 <tr><td class="np">3</td>
 <td

In [32]:
soup.select('tr > td.text ')[0].get_text()

'Grupo Frontera - un x100to (w/ Bad Bunny)'

In [33]:
text = soup.select('tr > td.text ')[0].get_text()
parts = text.split(' - ')

In [34]:
artist = parts[0]
song = parts[1]

In [35]:
song

'un x100to (w/ Bad Bunny)'

In [36]:
artists=[]
songs=[]

for i in range(0,201,2):
    texto = soup.select('tr > td.text ')[i].get_text()
    parts = texto.split(' - ')
    artist = parts[0]
    song = parts[1]
    artists.append(artist)
    songs.append(song)

In [37]:
artists

['Grupo Frontera',
 'Eslabon Armado',
 'Yng Lvcas',
 'Junior H',
 'Peso Pluma',
 'Natanael Cano',
 'Peso Pluma',
 'Chino Pacas',
 'Becky G',
 'Oscar Maydon',
 'Fuerza Regida',
 'Cartel De Santa',
 'KAROL G',
 'Carin Leon',
 'Junior H',
 'Peso Pluma',
 'Fuerza Regida',
 'Yahritza Y Su Esencia',
 'Peso Pluma',
 'Feid',
 'Fuerza Regida',
 'Grupo Marca Registrada',
 'El Chachito',
 'ROSALÍA',
 'Luis R Conriquez',
 'Ovy On The Drums',
 'Kenia OS',
 'Peso Pluma',
 'Bizarrap',
 'Oscar Maydon',
 'Yandel',
 'Peso Pluma',
 'Chino Pacas',
 'Natanael Cano',
 'Junior H',
 'Eladio Carrion',
 'Manuel Turizo',
 'Bad Bunny',
 'Bad Bunny',
 'Grupo Frontera',
 'Yuridia',
 'Arcángel',
 'KAROL G',
 'Bizarrap',
 'Natanael Cano',
 'Junior H',
 'Miley Cyrus',
 'Yng Lvcas',
 'Grupo Marca Registrada',
 'Ozuna',
 'Grupo Firme',
 'Bad Bunny',
 'Feid',
 'Peso Pluma',
 'La Adictiva',
 'Marshmello',
 'Gabito Ballesteros',
 'Jaziel Avilez',
 'Junior H',
 'KAROL G',
 'KAROL G',
 'Gorillaz',
 'Eslabon Armado',
 'Kali U

In [38]:
soup.select('tr > td.text ')[4].get_text()

'Yng Lvcas - La Bebe - Remix (w/ Peso Pluma)'

In [39]:
songs

['un x100to (w/ Bad Bunny)',
 'Ella Baila Sola (w/ Peso Pluma)',
 'La Bebe',
 'El Azul (w/ Peso Pluma)',
 'PRC (w/ Natanael Cano)',
 'AMG (w/ Peso Pluma, Gabito Ballesteros)',
 'Por las Noches',
 'El Gordo Trae El Mando',
 'Chanel (w/ Peso Pluma)',
 'Fin de Semana (w/ Junior H)',
 'Ch y la Pizza (w/ Natanael Cano)',
 'Shorty Party (w/ La Kelly)',
 'TQG (w/ Shakira)',
 'Que Vuelvas (w/ Grupo Frontera)',
 'El Tsurito (w/ Gabito Ballesteros, Peso Pluma)',
 'Las Morras (w/ Blessd)',
 'Bebe Dame (w/ Grupo Frontera)',
 'Frágil (w/ Grupo Frontera)',
 'Rosa Pastel (w/ Jasiel Nuñez)',
 'Classy 101 (w/ Young Miko)',
 'Igualito a Mi Apá (w/ Peso Pluma)',
 'Di Que Si (w/ Grupo Frontera)',
 'En Paris (w/ Junior H)',
 'BESO (w/ Rauw Alejandro)',
 'El Gavilán (w/ Tony Aguirre, Peso Pluma)',
 'EL HECHIZO (w/ Peso Pluma)',
 'Malas Decisiones',
 'Siempre Pendientes (w/ Luis R Conriquez)',
 'Shakira: Bzrp Music Sessions, Vol. 53 (w/ Shakira)',
 'Los Collares (w/ El Padrinito Toys)',
 'Yandel 150 (w/ Feid

In [40]:
data_mex = {'Song': songs, 'Artist': artists}

In [41]:
data_mex

{'Song': ['un x100to (w/ Bad Bunny)',
  'Ella Baila Sola (w/ Peso Pluma)',
  'La Bebe',
  'El Azul (w/ Peso Pluma)',
  'PRC (w/ Natanael Cano)',
  'AMG (w/ Peso Pluma, Gabito Ballesteros)',
  'Por las Noches',
  'El Gordo Trae El Mando',
  'Chanel (w/ Peso Pluma)',
  'Fin de Semana (w/ Junior H)',
  'Ch y la Pizza (w/ Natanael Cano)',
  'Shorty Party (w/ La Kelly)',
  'TQG (w/ Shakira)',
  'Que Vuelvas (w/ Grupo Frontera)',
  'El Tsurito (w/ Gabito Ballesteros, Peso Pluma)',
  'Las Morras (w/ Blessd)',
  'Bebe Dame (w/ Grupo Frontera)',
  'Frágil (w/ Grupo Frontera)',
  'Rosa Pastel (w/ Jasiel Nuñez)',
  'Classy 101 (w/ Young Miko)',
  'Igualito a Mi Apá (w/ Peso Pluma)',
  'Di Que Si (w/ Grupo Frontera)',
  'En Paris (w/ Junior H)',
  'BESO (w/ Rauw Alejandro)',
  'El Gavilán (w/ Tony Aguirre, Peso Pluma)',
  'EL HECHIZO (w/ Peso Pluma)',
  'Malas Decisiones',
  'Siempre Pendientes (w/ Luis R Conriquez)',
  'Shakira: Bzrp Music Sessions, Vol. 53 (w/ Shakira)',
  'Los Collares (w/ El P

In [42]:
df_mex = pd.DataFrame(data_mex)

In [43]:
df_mex

Unnamed: 0,Song,Artist
0,un x100to (w/ Bad Bunny),Grupo Frontera
1,Ella Baila Sola (w/ Peso Pluma),Eslabon Armado
2,La Bebe,Yng Lvcas
3,El Azul (w/ Peso Pluma),Junior H
4,PRC (w/ Natanael Cano),Peso Pluma
...,...,...
96,El Chamaquito (w/ Angel Cervantes),Virlan Garcia
97,Me Rehúso,Danny Ocean
98,Uno mas Uno Igual a Zero (w/ Tony Aguirre),Abraham Vazquez
99,Desesperados (w/ Chencho Corleone),Rauw Alejandro


In [44]:
df_mex[df_mex['Artist'].str.contains('Harry')]

Unnamed: 0,Song,Artist
67,As It Was,Harry Styles


In [45]:
df_mex[df_mex['Artist'].str.contains('Luis')]

Unnamed: 0,Song,Artist
24,"El Gavilán (w/ Tony Aguirre, Peso Pluma)",Luis R Conriquez
81,Ahora Te Puedes Marchar,Luis Miguel


In [46]:
df_mex[df_mex['Artist'].str.contains('Peso')]

Unnamed: 0,Song,Artist
4,PRC (w/ Natanael Cano),Peso Pluma
6,Por las Noches,Peso Pluma
15,Las Morras (w/ Blessd),Peso Pluma
18,Rosa Pastel (w/ Jasiel Nuñez),Peso Pluma
27,Siempre Pendientes (w/ Luis R Conriquez),Peso Pluma
31,Por las Noches,Peso Pluma
53,El Belicon (w/ Raul Vega),Peso Pluma


In [47]:
df_mex[df_mex['Artist'].str.contains('Bad')]

Unnamed: 0,Song,Artist
37,Ojitos Lindos (w/ Bomba Estéreo),Bad Bunny
38,Me Porto Bonito (w/ Chencho Corleone),Bad Bunny
51,Efecto,Bad Bunny
70,Neverita,Bad Bunny
72,Tití Me Preguntó,Bad Bunny
74,Moscow Mule,Bad Bunny


In [48]:
df_mex.head(10)

Unnamed: 0,Song,Artist
0,un x100to (w/ Bad Bunny),Grupo Frontera
1,Ella Baila Sola (w/ Peso Pluma),Eslabon Armado
2,La Bebe,Yng Lvcas
3,El Azul (w/ Peso Pluma),Junior H
4,PRC (w/ Natanael Cano),Peso Pluma
5,"AMG (w/ Peso Pluma, Gabito Ballesteros)",Natanael Cano
6,Por las Noches,Peso Pluma
7,El Gordo Trae El Mando,Chino Pacas
8,Chanel (w/ Peso Pluma),Becky G
9,Fin de Semana (w/ Junior H),Oscar Maydon


Ya que tenemos 2 df de las canciones mas populares en mexico y otra en el Mundo haremos otra de musica indespensable

## Rolling Stone's 500 Greatest Songs of All Time

In [162]:
url = 'https://cs.uwaterloo.ca/~dtompkin/music/list/Best9.html'

In [163]:
response = requests.get(url)
response

<Response [200]>

In [164]:
soup = BeautifulSoup(response.content, 'html.parser')

In [165]:
soup.select('td a ')

[<a href="Best.html">Best</a>,
 <a href="javascript:document.getElementById('POWERTRK_155-14').play()"><img src="../play.png"/></a>,
 <a name="tPOWERTRK_155-14"></a>,
 <a href="../artist/B/B167.html">Bob Dylan</a>,
 <a href="../title/L.html#tPOWERTRK_155-14">Like a Rolling Stone</a>,
 <a href="../bpm/96.html"> 95.6</a>,
 <a href="../year/1965.html">1965</a>,
 <a href="../genre/genre31.html">Rock 1960s</a>,
 <a href="../disc/POWERTRK_155.html">POWERTRK_155-14</a>,
 <a href="../track/POWERTRK_155/POWERTRK_155-14.html">details...</a>,
 <a href="javascript:document.getElementById('ESSENTLS_012-13').play()"><img src="../play.png"/></a>,
 <a name="tESSENTLS_012-13"></a>,
 <a href="../artist/R/R28.html">The Rolling Stones</a>,
 <a href="../title/S.html#tESSENTLS_012-13">Satisfaction</a>,
 <a href="../bpm/135.html">134.7</a>,
 <a href="../year/1985.html">1985</a>,
 <a href="../genre/genre33.html">Rock 1980s</a>,
 <a href="../disc/ESSENTLS_012.html">ESSENTLS_012-13</a>,
 <a href="../track/ESSEN

In [166]:
soup.find("a", href="../title/L.html#tPOWERTRK_155-14")

<a href="../title/L.html#tPOWERTRK_155-14">Like a Rolling Stone</a>

In [167]:
soup.select("a")

[<a href="../../index.html">Dave Tompkins</a>,
 <a href="../index.html">Music Database</a>,
 <a class="singlemenu" href="../index.html">INTRODUCTION</a>,
 <a class="singlemenu" href="../disc/index.html">DISCS</a>,
 <a class="singlemenu" href="../covers/index.html">COVERS</a>,
 <a class="singlemenu" href="../genre/index.html">GENRE</a>,
 <a class="singlemenu" href="../artisttag/index.html">ARTIST TAGS</a>,
 <a class="singlemenu" href="../year/index.html">YEAR</a>,
 <a class="singlemenu" href="../bpm/index.html">BPM</a>,
 <a href="../artist/0/index.html">#</a>,
 <a href="../artist/A/index.html">A</a>,
 <a href="../artist/B/index.html">B</a>,
 <a href="../artist/C/index.html">C</a>,
 <a href="../artist/D/index.html">D</a>,
 <a href="../artist/E/index.html">E</a>,
 <a href="../artist/F/index.html">F</a>,
 <a href="../artist/G/index.html">G</a>,
 <a href="../artist/H/index.html">H</a>,
 <a href="../artist/I/index.html">I</a>,
 <a href="../artist/J/index.html">J</a>,
 <a href="../artist/K/in

In [168]:
soup.select("tbody td.musichead")

[]

In [169]:
for i in range(60):
    print(i,soup.select("tbody td")[i])

0 <td>#</td>
1 <td><img src="../playblack.png"/></td>
2 <td>ARTIST</td>
3 <td>TITLE</td>
4 <td>TIME</td>
5 <td>BPM</td>
6 <td>YEAR</td>
7 <td>GENRE</td>
8 <td>DISC-TRACK</td>
9 <td>DETAILS</td>
10 <td>1</td>
11 <td><a href="javascript:document.getElementById('POWERTRK_155-14').play()"><img src="../play.png"/></a><audio id="POWERTRK_155-14" src="../samples/POWERTRK_155/sample-POWERTRK_155-14.mp3"></audio></td>
12 <td><a name="tPOWERTRK_155-14"></a><a href="../artist/B/B167.html">Bob Dylan</a></td>
13 <td><a href="../title/L.html#tPOWERTRK_155-14">Like a Rolling Stone</a></td>
14 <td> 6:04</td>
15 <td><a href="../bpm/96.html"> 95.6</a></td>
16 <td><a href="../year/1965.html">1965</a></td>
17 <td><a href="../genre/genre31.html">Rock 1960s</a></td>
18 <td class="djddid"><a href="../disc/POWERTRK_155.html">POWERTRK_155-14</a></td>
19 <td>(<a href="../track/POWERTRK_155/POWERTRK_155-14.html">details...</a>)</td>
20 <td>2</td>
21 <td><a href="javascript:document.getElementById('ESSENTLS_012-1

In [170]:
soup.select("tbody td")[13]

<td><a href="../title/L.html#tPOWERTRK_155-14">Like a Rolling Stone</a></td>

In [171]:
soup.select("tbody td")[23]

<td><a href="../title/S.html#tESSENTLS_012-13">Satisfaction</a></td>

In [182]:
soup.select('tr > td:nth-child(4)')[0].get_text().strip()

'TITLE'

In [224]:
soup.select('tbody > tr > td:nth-child(4)')

[<td>TITLE</td>,
 <td><a href="../title/L.html#tPOWERTRK_155-14">Like a Rolling Stone</a></td>,
 <td><a href="../title/S.html#tESSENTLS_012-13">Satisfaction</a></td>,
 <td><a href="../title/I.html#tPOWERTRK_023-07">Imagine</a></td>,
 <td><a href="../title/W.html#tPOWERTRK_085-07">What's Going On</a></td>,
 <td><a href="../title/R.html#tPOWERTRK_046-16">Respect</a></td>,
 <td><a href="../title/G.html#tPOWERTRK_032-08">Good Vibrations</a></td>,
 <td><a href="../title/J.html#tPOWERTRK_021-12">Johnny B. Goode</a></td>,
 <td><a href="../title/H.html#tBEATLES__BLA-13">Hey Jude</a></td>,
 <td><a href="../title/S.html#tPOWERTRK_022-01">Smells Like Teen Spirit</a></td>,
 <td><a href="../title/W.html#tPOWERTRK_071-16">What'd I Say (Parts 1 And 2)</a></td>,
 <td><a href="../title/M.html#tPOWERTRK_043-14">My Generation</a></td>,
 <td><a href="../title/A.html#tDTRANDOM_005-16">A Change Is Gonna Come</a></td>,
 <td><a href="../title/Y.html#tBEATLES__RDA-13">Yesterday</a></td>,
 <td><a href="../title

In [221]:
soup.select('tbody > tr > td:nth-child(4) a')[0].get_text()

'Like a Rolling Stone'

In [204]:
RS_500_Songs=[]
for i in range(500):
    song = soup.select('tbody > tr > td:nth-child(4) a')[i].get_text()
    RS_500_Songs.append(RS_500_Songs)

In [205]:
'''RS_500_Songs=[]
for i in range(13,5013,10):
    song = soup.select("tbody td")[i].get_text()
    RS_500_Songs.append(song)
    '''

'RS_500_Songs=[]\nfor i in range(13,5013,10):\n    song = soup.select("tbody td")[i].get_text()\n    RS_500_Songs.append(song)\n    '

In [206]:
len(RS_500_Songs)

500

In [61]:
RS_500_Songs

['Like a Rolling Stone',
 'Satisfaction',
 'Imagine',
 "What's Going On",
 'Respect',
 'Good Vibrations',
 'Johnny B. Goode',
 'Hey Jude',
 'Smells Like Teen Spirit',
 "What'd I Say (Parts 1 And 2)",
 'My Generation',
 'A Change Is Gonna Come',
 'Yesterday',
 "Blowin' in the Wind",
 'London Calling',
 'I Want to Hold Your Hand',
 'Purple Haze',
 'Maybellene',
 'Hound Dog',
 'Let It Be',
 'Born to Run',
 'Be My Baby',
 'In My Life',
 'People Get Ready',
 'God Only Knows',
 'A Day in the Life',
 'Layla',
 "(Sittin' On) the Dock of the Bay",
 'Help!',
 'I Walk the Line',
 'Stairway to Heaven',
 'Sympathy for the Devil',
 'River Deep - Mountain High',
 "You've Lost That Lovin' Feelin'",
 'Light My Fire',
 'One',
 'No Woman No Cry',
 'Gimme Shelter',
 "That'll Be the Day",
 'Dancing in the Street',
 'The Weight',
 'Waterloo Sunset',
 'Tutti-Frutti',
 'Georgia on My Mind',
 'Heartbreak Hotel',
 'Heroes',
 'Bridge Over Troubled Water',
 'All Along the Watchtower',
 'Hotel California',
 'Track

In [207]:
soup.select("tbody td")[12].get_text()

'Bob Dylan'

In [241]:
soup.select('tbody > tr > td:nth-child(3) a')[1].get_text()

'Bob Dylan'

In [244]:
RS_500_artist=[]
for i in range(1,1002,2):
    artist = soup.select('tbody > tr > td:nth-child(3) a')[i].get_text()
    RS_500_artist.append(artist)

In [245]:
'''RS_500_artist=[]
for i in range(12,5012,10):
    artist = soup.select("tbody td")[i].get_text()
    RS_500_artist.append(artist)
    '''

'RS_500_artist=[]\nfor i in range(12,5012,10):\n    artist = soup.select("tbody td")[i].get_text()\n    RS_500_artist.append(artist)\n    '

In [246]:
len(RS_500_artist)

501

In [65]:
#tbody tr 3 hijo

In [247]:
#soup.select("tbody td:nth-child(3)")

In [67]:
RS_500_artist

['Bob Dylan',
 'The Rolling Stones',
 'John Lennon',
 'Marvin Gaye',
 'Aretha Franklin',
 'The Beach Boys',
 'Chuck Berry',
 'The Beatles',
 'Nirvana',
 'Ray Charles',
 'The Who',
 'Sam Cooke',
 'The Beatles',
 'Bob Dylan',
 'The Clash',
 'The Beatles',
 'Jimi Hendrix',
 'Chuck Berry',
 'Elvis Presley',
 'The Beatles',
 'Bruce Springsteen',
 'The Ronettes',
 'The Beatles',
 'The Impressions',
 'The Beach Boys',
 'The Beatles',
 'Derek & The Dominos',
 'Otis Redding',
 'The Beatles',
 'Johnny Cash',
 'Led Zeppelin',
 'The Rolling Stones',
 'Tina Turner',
 'The Righteous Brothers',
 'The Doors',
 'U2',
 'Bob Marley & The Wailers',
 'The Rolling Stones',
 'Buddy Holly',
 'Martha Reeves & The Vandellas',
 'The Band',
 'The Kinks',
 'Little Richard',
 'Ray Charles',
 'Elvis Presley',
 'David Bowie',
 'Simon & Garfunkel',
 'Jimi Hendrix',
 'The Eagles',
 'Smokey Robinson',
 'Grandmaster Flash & Melle Mel',
 'Prince',
 'Sex Pistols',
 'Percy Sledge',
 'The Kingsmen',
 'Little Richard',
 'Proc

In [248]:
RS_data = {'Song': RS_500_Songs, 'Artist': RS_500_artist}

In [249]:
df_RS = pd.DataFrame(RS_data)

ValueError: All arrays must be of the same length

In [None]:
df_RS

# Practice web scraping

Retrieve an arbitrary Wikipedia page of "Python" and create a list of links on that page: url ='https://en.wikipedia.org/wiki/Python'

In [71]:
url = 'https://en.wikipedia.org/wiki/Python'

In [72]:
response = requests.get(url)
response

<Response [200]>

In [73]:
soup = BeautifulSoup(response.content, 'html.parser')

In [74]:
soup.find_all('a')

[<a class="mw-jump-link" href="#bodyContent">Jump to content</a>,
 <a accesskey="z" href="/wiki/Main_Page" title="Visit the main page [z]"><span>Main page</span></a>,
 <a href="/wiki/Wikipedia:Contents" title="Guides to browsing Wikipedia"><span>Contents</span></a>,
 <a href="/wiki/Portal:Current_events" title="Articles related to current events"><span>Current events</span></a>,
 <a accesskey="x" href="/wiki/Special:Random" title="Visit a randomly selected article [x]"><span>Random article</span></a>,
 <a href="/wiki/Wikipedia:About" title="Learn about Wikipedia and how it works"><span>About Wikipedia</span></a>,
 <a href="//en.wikipedia.org/wiki/Wikipedia:Contact_us" title="How to contact Wikipedia"><span>Contact us</span></a>,
 <a href="https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&amp;utm_medium=sidebar&amp;utm_campaign=C13_en.wikipedia.org&amp;uselang=en" title="Support us by donating to the Wikimedia Foundation"><span>Donate</span></a>,
 <a href=

In [75]:
links=[]
for i in soup.find_all('a'):
    href = i.get('href')
    if href and href.startswith('/wiki/'):
        links.append(href)

In [76]:
links

['/wiki/Main_Page',
 '/wiki/Wikipedia:Contents',
 '/wiki/Portal:Current_events',
 '/wiki/Special:Random',
 '/wiki/Wikipedia:About',
 '/wiki/Help:Contents',
 '/wiki/Help:Introduction',
 '/wiki/Wikipedia:Community_portal',
 '/wiki/Special:RecentChanges',
 '/wiki/Wikipedia:File_upload_wizard',
 '/wiki/Main_Page',
 '/wiki/Special:Search',
 '/wiki/Help:Introduction',
 '/wiki/Special:MyContributions',
 '/wiki/Special:MyTalk',
 '/wiki/Python',
 '/wiki/Talk:Python',
 '/wiki/Python',
 '/wiki/Python',
 '/wiki/Special:WhatLinksHere/Python',
 '/wiki/Special:RecentChangesLinked/Python',
 '/wiki/Wikipedia:File_Upload_Wizard',
 '/wiki/Special:SpecialPages',
 '/wiki/Pythonidae',
 '/wiki/Python_(genus)',
 '/wiki/Python_(mythology)',
 '/wiki/Python_(programming_language)',
 '/wiki/CMU_Common_Lisp',
 '/wiki/PERQ#PERQ_3',
 '/wiki/Python_of_Aenus',
 '/wiki/Python_(painter)',
 '/wiki/Python_of_Byzantium',
 '/wiki/Python_of_Catana',
 '/wiki/Python_Anghelo',
 '/wiki/Python_(Efteling)',
 '/wiki/Python_(Busch_G

Find the number of titles that have changed in the United States Code since its last release point: url = 'http://uscode.house.gov/download/download.shtml'

In [77]:
url = 'http://uscode.house.gov/download/download.shtml'
response = requests.get(url)
response

<Response [200]>

In [78]:
soup = BeautifulSoup(response.content, 'html.parser')

#### DOCUMENT NOT FOUND
The document you were looking for does not exist. 

In [79]:
url = 'https://www.fbi.gov/wanted/topten'
response = requests.get(url)
response

<Response [200]>

In [80]:
soup = BeautifulSoup(response.content, 'html.parser')

In [81]:
soup.select('ul a')

[<a href="https://www.fbi.gov/wanted">Most Wanted</a>,
 <a href="https://www.fbi.gov/news">News</a>,
 <a href="https://www.fbi.gov/investigate">What We Investigate</a>,
 <a href="https://www.fbi.gov/how-we-can-help-you">How We Can Help You</a>,
 <a href="https://www.fbi.gov/tips">Submit a Tip</a>,
 <a href="https://www.fbi.gov/about">About</a>,
 <a href="https://www.fbi.gov/contact-us">Contact Us</a>,
 <a href="https://www.facebook.com/FBI"><svg aria-labelledby="title" height="20" role="img" version="1.1" viewbox="0 0 16 16" width="20" xmlns="http://www.w3.org/2000/svg">
 <title>Facebook Icon</title>
 <path class="facebook-icon" d="m15.115 0q0.36458 0 0.62499 0.26042 0.26048 0.26042 0.26048 0.62501v14.229q0 0.36458-0.26042 0.62499-0.26032 0.26048-0.62496 0.26048h-4.073v-6.1979h2.073l0.3125-2.4166h-2.3854v-1.5417q0-0.58333 0.24478-0.87501 0.24478-0.29166 0.95312-0.29166l1.2708-0.010416v-2.1563q-0.65626-0.09375-1.8542-0.09375-1.4167 0-2.2656 0.83333-0.8493 0.8341-0.8493 2.355v1.7813h-2.0

In [82]:
soup.select('ul a')[4]

<a href="https://www.fbi.gov/tips">Submit a Tip</a>

In [83]:
for i in range(80):
    print(i,soup.select('ul a')[i])

0 <a href="https://www.fbi.gov/wanted">Most Wanted</a>
1 <a href="https://www.fbi.gov/news">News</a>
2 <a href="https://www.fbi.gov/investigate">What We Investigate</a>
3 <a href="https://www.fbi.gov/how-we-can-help-you">How We Can Help You</a>
4 <a href="https://www.fbi.gov/tips">Submit a Tip</a>
5 <a href="https://www.fbi.gov/about">About</a>
6 <a href="https://www.fbi.gov/contact-us">Contact Us</a>
7 <a href="https://www.facebook.com/FBI"><svg aria-labelledby="title" height="20" role="img" version="1.1" viewbox="0 0 16 16" width="20" xmlns="http://www.w3.org/2000/svg">
<title>Facebook Icon</title>
<path class="facebook-icon" d="m15.115 0q0.36458 0 0.62499 0.26042 0.26048 0.26042 0.26048 0.62501v14.229q0 0.36458-0.26042 0.62499-0.26032 0.26048-0.62496 0.26048h-4.073v-6.1979h2.073l0.3125-2.4166h-2.3854v-1.5417q0-0.58333 0.24478-0.87501 0.24478-0.29166 0.95312-0.29166l1.2708-0.010416v-2.1563q-0.65626-0.09375-1.8542-0.09375-1.4167 0-2.2656 0.83333-0.8493 0.8341-0.8493 2.355v1.7813h-2.08

In [84]:
soup.select('div', class_='focuspoint')[0].get_text().strip()

"An official website of the United States government. Here's\xa0how\xa0you\xa0know\n\n\n\n\n\n\n\nOfficial websites use .gov\n\n\nA .gov website belongs to an official government organization in the United States.\n\n\n\n\n\n\n\nSecure .gov websites use HTTPS\n\n\nA lock () or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.\n\n\n\n\n\n\n\n\n\n\nSubmit Search\n\n\n\nSearch\n\n\nFBI\n\n\nMore\n\n \n\n\n\n\n\n\nMost Wanted\n\nNews\n\nWhat We Investigate\n\nHow We Can Help You\n\nSubmit a Tip\n\nAbout\n\nContact Us\n\n\n\n\nHome\n\n\n\n\n\n\nMost Wanted\n\n\n\n\n\nFacebook Icon\n\n\n\n\n\nEmail Icon\n\n\n\n\n\nTwitter Icon\n\n\n\n\n\nYoutube Icon\n\n\n\n\n\nFlickr Icon\n\n\n\n\n\nLinkedIn Icon\n\n\n\n\n\nInstagram Icon\n\n\n\n\n\n\n\nSearch FBI\n\n\n\nSubmit Search\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFBIFederal Bureau of Investigation\n\n\n\n\n\n\n\nTen Most Wanted Fugitives\n\n\n\nMost Wanted\n\n\n\n\n\nTen Most Wante

In [85]:
soup.select('a[href^="https://www.fbi.gov/wanted/topten/"]')[3].get_text().strip()

'YULAN ADONAY ARCHAGA CARIAS'

In [86]:
soup.select('a[href^="https://www.fbi.gov/wanted/topten/"]')[5].get_text().strip()

'BHADRESHKUMAR CHETANBHAI PATEL'

In [87]:
len(soup.select('a[href^="https://www.fbi.gov/wanted/topten/"]'))

24

In [88]:
Top_Criminals=[]
for i in range(3,23,2):
    criminal = soup.select('a[href^="https://www.fbi.gov/wanted/topten/"]')[i].get_text().strip()
    Top_Criminals.append(criminal)


In [89]:
Top_Criminals

['YULAN ADONAY ARCHAGA CARIAS',
 'BHADRESHKUMAR CHETANBHAI PATEL',
 'WILVER VILLEGAS-PALOMINO',
 'ALEJANDRO ROSALES CASTILLO',
 'RUJA IGNATOVA',
 'ARNOLDO JIMENEZ',
 'OMAR ALEXANDER CARDENAS',
 'ALEXIS FLORES',
 'MICHAEL JAMES PRATT',
 'JOSE RODOLFO VILLARREAL-HERNANDEZ']

Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe: url = 'https://www.emsc-csem.org/Earthquake/'

In [250]:
url = 'https://www.emsc-csem.org/Earthquake/'
response = requests.get(url)
response

<Response [200]>

In [251]:
soup = BeautifulSoup(response.content, 'html.parser')

In [252]:
soup.select('div')

[<div id="contenu"><!-- Start container -->
 <div id="bandeau"><style> .ban022{background:url('https://static1.emsc.eu/Css/img/spriteBannApp.jpg') no-repeat;} .ban0{background:#B40015;/*#000;*/width:100%;height:100%;border:0;} .ban1{vertical-align:top;text-align:center;padding-top:4px;width:150px;} 
 			.ban2{position:relative; vertical-align:middle;text-align:center; color:#fff; background-position:0px -41px;/*10px -41px;*/} 
 			.ban3{width:140px;padding-left:5px;text-align:center;background-position:-640px -41px;}
 			#ejs_server_heure {position:absolute; bottom:0; width:100%;}
 			#bandeau{background:#B40015;}</style><table cellpadding="0" cellspacing="0" class="ban0"><tr><td class="ban1" onclick="window.location.href='https://www.emsc-csem.org';" onmouseover="this.style.cursor='pointer';">
 <img alt="logo - home" src="https://static2.emsc.eu/Css/img/logo.png" style="width:150px;height:81px;"/></td>
 <td class="ban2 ban022" onclick="window.location.href='https://www.emsc-csem.org';

In [253]:
soup.find('table', {'class': 'table'})

In [254]:
soup.select('tbody tr')[0].get_text()

'6IVearthquake2023-05-04\xa0\xa0\xa023:05:56.717min ago19.35\xa0N\xa0\xa0155.09\xa0W\xa0\xa08ML3.8\xa0ISLAND OF HAWAII, HAWAII2023-05-04 23:19'

In [255]:
for i in range(20):
    print(i,soup.select('tbody tr')[i].get_text().split())

0 ['6IVearthquake2023-05-04', '23:05:56.717min', 'ago19.35', 'N', '155.09', 'W', '8ML3.8', 'ISLAND', 'OF', 'HAWAII,', 'HAWAII2023-05-04', '23:19']
1 ['earthquake2023-05-04', '23:05:54.017min', 'ago46.31', 'N', '14.82', 'E', '10ML2.1', 'SLOVENIA2023-05-04', '23:10']
2 ['earthquake2023-05-04', '22:51:04.332min', 'ago34.61', 'N', '25.91', 'E', '30ML3.6', 'CRETE,', 'GREECE2023-05-04', '23:08']
3 ['earthquake2023-05-04', '22:39:30.843min', 'ago37.74', 'N', '15.04', 'E', '3ML2.1', 'SICILY,', 'ITALY2023-05-04', '22:49']
4 ['earthquake2023-05-04', '22:28:29.054min', 'ago24.67', 'N', '94.17', 'E', '10', 'M2.7', 'MYANMAR-INDIA', 'BORDER', 'REGION2023-05-04', '22:45']
5 ['earthquake2023-05-04', '22:20:46.11hr', '02min', 'ago8.40', 'S', '160.30', 'E', '100mb5.1', 'SOLOMON', 'ISLANDS2023-05-04', '22:47']
6 ['earthquake2023-05-04', '22:03:55.51hr', '19min', 'ago44.85', 'N', '26.41', 'E', '18ML2.7', 'ROMANIA2023-05-04', '22:39']
7 ['earthquake2023-05-04', '21:54:29.81hr', '28min', 'ago39.19', 'N', '4

In [256]:
soup.select('tbody')[0].get_text()

"6IVearthquake2023-05-04\xa0\xa0\xa023:05:56.717min ago19.35\xa0N\xa0\xa0155.09\xa0W\xa0\xa08ML3.8\xa0ISLAND OF HAWAII, HAWAII2023-05-04 23:19\nearthquake2023-05-04\xa0\xa0\xa023:05:54.017min ago46.31\xa0N\xa0\xa014.82\xa0E\xa0\xa010ML2.1\xa0SLOVENIA2023-05-04 23:10\nearthquake2023-05-04\xa0\xa0\xa022:51:04.332min ago34.61\xa0N\xa0\xa025.91\xa0E\xa0\xa030ML3.6\xa0CRETE, GREECE2023-05-04 23:08\nearthquake2023-05-04\xa0\xa0\xa022:39:30.843min ago37.74\xa0N\xa0\xa015.04\xa0E\xa0\xa03ML2.1\xa0SICILY, ITALY2023-05-04 22:49\nearthquake2023-05-04\xa0\xa0\xa022:28:29.054min ago24.67\xa0N\xa0\xa094.17\xa0E\xa0\xa010 M2.7\xa0MYANMAR-INDIA BORDER REGION2023-05-04 22:45\nearthquake2023-05-04\xa0\xa0\xa022:20:46.11hr 02min ago8.40\xa0S\xa0\xa0160.30\xa0E\xa0\xa0100mb5.1\xa0SOLOMON ISLANDS2023-05-04 22:47\nearthquake2023-05-04\xa0\xa0\xa022:03:55.51hr 19min ago44.85\xa0N\xa0\xa026.41\xa0E\xa0\xa018ML2.7\xa0ROMANIA2023-05-04 22:39\nearthquake2023-05-04\xa0\xa0\xa021:54:29.81hr 28min ago39.19\xa0N\xa0

In [259]:
soup.select('tr > td:nth-child(12)')[0].get_text().strip() #Ubicacion

'ISLAND OF HAWAII, HAWAII'

In [258]:
for i in range(20):
    print(soup.select('tr > td:nth-child(12)')[i].get_text().strip())

ISLAND OF HAWAII, HAWAII
SLOVENIA
CRETE, GREECE
SICILY, ITALY
MYANMAR-INDIA BORDER REGION
SOLOMON ISLANDS
ROMANIA
EASTERN TURKEY
TARAPACA, CHILE
SULAWESI, INDONESIA
FLORES SEA
CENTRAL TURKEY
PANAY, PHILIPPINES
JUJUY, ARGENTINA
PYRENEES
TONGA REGION
SOUTH ISLAND OF NEW ZEALAND
NEPAL
NEPAL
CENTRAL CALIFORNIA


In [273]:
for i in range(1,13,1):
    print(i, soup.select('tr > td:nth-child('+str(i)+')')[0].get_text().strip())

1 
2 Current time: 2023-05-04 23:23:21 UTC
3 Member access

Name 


Pwd
4 earthquake2023-05-04   23:05:56.717min ago
5 19.35
6 N
7 155.09
8 W
9 8
10 ML
11 3.8
12 ISLAND OF HAWAII, HAWAII


### Hijos

    4. Hora
    5. Latitude Degree
    6. Latitude Degree (N,S)
    7. Longitude Degree 
    8. Longitude Degree (W,E)
    9. Depth
    11. Mag
    12. Ubicacion
    

In [274]:
for i in range(20):
    print(soup.select('tr > td:nth-child(11)')[i].get_text().strip())

3.8
2.1
3.6
2.1
2.7
5.1
2.7
3.1
2.7
4.4
3.0
3.0
3.2
3.5
1.9
4.9
3.1
2.8
3.3
2.0


In [406]:
Hora=[]
Dia=[]
Latitude_Degree=[]
Latitude_Degree_NS=[]
Longitude_Degree=[] 
Longitude_Degree_WE=[]
Depth=[]
Mag=[]
Ubicacion=[]

In [362]:
for i in range(50):
    Hora.append(soup.select(('tr > td:nth-child(4)'))[i].get_text(strip=False).replace('earthquake', '').replace('ago', '').replace('min','').split("\xa0\xa0\xa0", 1)[1])
    Dia.append(soup.select(('tr > td:nth-child(4)'))[i].get_text(strip=False).replace('earthquake', '').replace('ago', '').split("\xa0\xa0\xa0", 1)[0])
    #Latitude_Degree.append(soup.select(('tr > td:nth-child(5)'))[i].get_text(strip=True).replace('\t', ''))
    Latitude_Degree.append(soup.select(('tr > td:nth-child(5)'))[i].get_text(strip=False).replace('\t', ''))

In [371]:
Ubicacion

['ISLAND OF HAWAII, HAWAII',
 'SLOVENIA',
 'CRETE, GREECE',
 'SICILY, ITALY',
 'MYANMAR-INDIA BORDER REGION',
 'SOLOMON ISLANDS',
 'ROMANIA',
 'EASTERN TURKEY',
 'TARAPACA, CHILE',
 'SULAWESI, INDONESIA',
 'FLORES SEA',
 'CENTRAL TURKEY',
 'PANAY, PHILIPPINES',
 'JUJUY, ARGENTINA',
 'PYRENEES',
 'TONGA REGION',
 'SOUTH ISLAND OF NEW ZEALAND',
 'NEPAL',
 'NEPAL',
 'CENTRAL CALIFORNIA',
 'ISLAND OF HAWAII, HAWAII',
 'SLOVENIA',
 'CRETE, GREECE',
 'SICILY, ITALY',
 'MYANMAR-INDIA BORDER REGION',
 'SOLOMON ISLANDS',
 'ROMANIA',
 'EASTERN TURKEY',
 'TARAPACA, CHILE',
 'SULAWESI, INDONESIA',
 'FLORES SEA',
 'CENTRAL TURKEY',
 'PANAY, PHILIPPINES',
 'JUJUY, ARGENTINA',
 'PYRENEES',
 'TONGA REGION',
 'SOUTH ISLAND OF NEW ZEALAND',
 'NEPAL',
 'NEPAL',
 'CENTRAL CALIFORNIA']

In [407]:
for i in range(50):   
    Ubicacion.append(soup.select(('tr > td:nth-child(12)'))[i].get_text(strip=True).replace('\t', ''))  
    
    Latitude_Degree.append(soup.select(('tr > td:nth-child(5)'))[i].get_text(strip=True).replace('\t', '').strip())   
    Latitude_Degree_NS.append(soup.select(('tr > td:nth-child(6)'))[i].get_text(strip=True).replace('\t', ''))    
    
    Longitude_Degree.append(soup.select(('tr > td:nth-child(7)'))[i].get_text(strip=True).replace('\t', ''))   
    Longitude_Degree_WE.append(soup.select(('tr > td:nth-child(8)'))[i].get_text(strip=True).replace('\t', ''))   
    
    Hora.append(soup.select(('tr > td:nth-child(4)'))[i].get_text(strip=False).replace('earthquake', '').replace('ago', '').replace('min','').split("\xa0\xa0\xa0", 1)[1])
    Dia.append(soup.select(('tr > td:nth-child(4)'))[i].get_text(strip=False).replace('earthquake', '').replace('ago', '').split("\xa0\xa0\xa0", 1)[0])
    #Latitude_Degree.append(soup.select(('tr > td:nth-child(5)'))[i].get_text(strip=True).replace('\t', ''))
    #Latitude_Degree.append(soup.select(('tr > td:nth-child(5)'))[i].get_text(strip=False).replace('\t', ''))
    
    Mag.append(soup.select(('tr > td:nth-child(11)'))[i].get_text(strip=True))

In [375]:
soup.select(('tr > td:nth-child(4)'))[i].get_text(strip=False).replace('earthquake', '').replace('ago', '').replace('min','').split("\xa0\xa0\xa0", 1)[1]

'18:30:18.44hr 53 '

In [376]:
soup.select(('tr > td:nth-child(5)'))[0].get_text().strip()

'19.35'

In [398]:
Mag

[]

In [380]:
Hijos
4. Hora
5. Latitude Degree
6. Latitude Degree (N,S)
7. Longitude Degree 
8. Longitude Degree (W,E)
9. Depth
11. Mag
12. Ubicacion

SyntaxError: invalid syntax (2041375861.py, line 2)

In [381]:
Dia

['2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-04',
 '2023-05-

In [412]:
data = {'Hora': Hora, 
        'Dia': Dia, 
        'Latitude': Latitude_Degree, 
        'N/S': Latitude_Degree_NS,
       'Longitud' : Longitude_Degree,
        'W/E' : Longitude_Degree_WE,
        'Magnitud' : Mag,
        'Ubicacion' : Ubicacion
       }

In [409]:
data

{'Hora': ['23:05:56.717 ',
  '23:05:54.017 ',
  '22:51:04.332 ',
  '22:39:30.843 ',
  '22:28:29.054 ',
  '22:20:46.11hr 02 ',
  '22:03:55.51hr 19 ',
  '21:54:29.81hr 28 ',
  '21:49:50.01hr 33 ',
  '21:43:10.01hr 40 ',
  '21:40:35.01hr 42 ',
  '21:37:54.51hr 45 ',
  '21:32:16.01hr 51 ',
  '20:56:30.02hr 26 ',
  '20:41:22.02hr 41 ',
  '20:37:21.72hr 45 ',
  '20:10:09.33hr 13 ',
  '20:00:08.03hr 23 ',
  '19:51:37.03hr 31 ',
  '19:45:32.73hr 37 ',
  '19:43:36.03hr 39 ',
  '19:27:00.53hr 56 ',
  '19:26:52.03hr 56 ',
  '19:17:16.04hr 06 ',
  '19:09:56.04hr 13 ',
  '19:07:03.84hr 16 ',
  '18:56:18.04hr 27 ',
  '18:55:43.04hr 27 ',
  '18:48:38.04hr 34 ',
  '18:30:18.44hr 53 ',
  '18:27:31.04hr 55 ',
  '18:21:33.85hr 01 ',
  '18:15:30.05hr 07 ',
  '17:59:56.95hr 23 ',
  '17:39:23.05hr 43 ',
  '17:35:49.65hr 47 ',
  '17:29:24.95hr 53 ',
  '17:13:06.06hr 10 ',
  '17:02:17.06hr 21 ',
  '16:50:42.06hr 32 ',
  '16:45:02.06hr 38 ',
  '16:36:28.66hr 46 ',
  '16:02:32.07hr 20 ',
  '16:02:27.57hr 20 ',


In [413]:
df = pd.DataFrame(data)

In [414]:
len(Latitude_Degree)

50

In [415]:
df

Unnamed: 0,Hora,Dia,Latitude,N/S,Longitud,W/E,Magnitud,Ubicacion
0,23:05:56.717,2023-05-04,19.35,N,155.09,W,3.8,"ISLAND OF HAWAII, HAWAII"
1,23:05:54.017,2023-05-04,46.31,N,14.82,E,2.1,SLOVENIA
2,22:51:04.332,2023-05-04,34.61,N,25.91,E,3.6,"CRETE, GREECE"
3,22:39:30.843,2023-05-04,37.74,N,15.04,E,2.1,"SICILY, ITALY"
4,22:28:29.054,2023-05-04,24.67,N,94.17,E,2.7,MYANMAR-INDIA BORDER REGION
5,22:20:46.11hr 02,2023-05-04,8.4,S,160.3,E,5.1,SOLOMON ISLANDS
6,22:03:55.51hr 19,2023-05-04,44.85,N,26.41,E,2.7,ROMANIA
7,21:54:29.81hr 28,2023-05-04,39.19,N,40.2,E,3.1,EASTERN TURKEY
8,21:49:50.01hr 33,2023-05-04,21.09,S,68.84,W,2.7,"TARAPACA, CHILE"
9,21:43:10.01hr 40,2023-05-04,0.77,S,123.31,E,4.4,"SULAWESI, INDONESIA"


In [417]:
df[df['Ubicacion'].str.contains('MEXICO')]

Unnamed: 0,Hora,Dia,Latitude,N/S,Longitud,W/E,Magnitud,Ubicacion
34,17:39:23.05hr 43,2023-05-04,16.15,N,94.26,W,4.2,"OFFSHORE OAXACA, MEXICO"
