# Rock and Mineral Clubs

Scrape all of the rock and mineral clubs listed at https://rocktumbler.com/blog/rock-and-mineral-clubs/ (but don't just cut and paste!)

Save a CSV called `rock-clubs.csv` with the name of the club, their URL, and the city they're located in.

**Bonus**: Add a column for the state. There are a few ways to do this, but knowing that `element.parent` goes 'up' one element might be helpful.

* _**Hint:** The name of the club and the city are both inside of td elements, and they aren't distinguishable by class. Instead you'll just want to ask for all of the tds and then just ask for the text from the first or second one._
* _**Hint:** If you use BeautifulSoup, you can select elements by attributes other than class or id._

In [1]:
import requests
import re
import csv
import pandas as pd
from bs4 import BeautifulSoup

In [2]:
url = "https://rocktumbler.com/blog/rock-and-mineral-clubs/"
raw_html = requests.get(url).content
soup_doc = BeautifulSoup(raw_html, "html.parser")
print(type(soup_doc))

<class 'bs4.BeautifulSoup'>


In [3]:
### TEST VIEWS OF DATA
# raw_html
# print(soup_doc)
# print(soup_doc.prettify())

In [4]:
# soup_doc.find_all('h3')
soup_doc.find_all('table')[0]

<table bgcolor="#CCCCCC" cellpadding="4" cellspacing="1" width="100%"><tr><td bgcolor="#B9EDB8">
<h3>Alabama Rock and Mineral Clubs</h3>
</td></tr></table>

In [5]:
### INDIVIDUAL LINE TESTS
everything = soup_doc.find_all('section', limit=100)
# everything[1].a.string
# everything[1].find_all('td')[1].text
# everything[1].a['href']
# everything[1].find_parents()
everything[2].find('h3')
# everything[1].find_siblings()

<h3>Alaska Rock and Mineral Clubs</h3>

In [6]:
### SUCCESSFUL MINIMUM CODE
all_rows = []
all_tables = soup_doc.find_all('table', limit=100)
for mytable in all_tables:
    rows = mytable.find_all('tr')
    for row in rows[1:]:
        myrow = []
        info = row.find_all('td')
        for data in info:
            myrow.append(data.string)
        link = row.a['href']
        myrow.append(link)
        all_rows.append(myrow)
all_rows

[['Dothan Gem & Mineral Club',
  'Dothan',
  'http://www.wiregrassrockhounds.com/'],
 ['Huntsville Gem and Mineral Society',
  'Huntsville',
  'http://huntsvillegms.org/'],
 ['Mobile Rock & Gem Society', 'Mobile', 'http://www.mobilerockandgem.com/'],
 ['Montgomery Gem & Mineral Society',
  'Montgomery',
  'http://montgomerygemandmineralsociety.com/mgms/'],
 ['Mat-Su Rock and Mineral Club', 'Palmer', 'http://matsurockclub.com/'],
 ['Black Canyon City Rock Club',
  'Black Canyon City',
  'http://www.bccrockclub.mysite.com/'],
 ['Daisy Mountain Rock & Mineral Club', 'Anthem', 'http://www.dmrmc.com/'],
 ['Gila County Gem & Mineral Society', 'Miami', 'http://gilagem.org/'],
 ['Huachuca Mineral and Gem Club',
  'Sierra Vista',
  'http://www.huachucamineralandgemclub.info/'],
 ['Lake Havasu Gem & Mineral Society',
  'Lake Havasu City',
  'http://www.lakehavasugms.org/'],
 ['Mineralogical Society of Arizona', 'Scottsdale', 'http://www.msaaz.org/'],
 ['Mingus Gem & Mineral Club', 'Cottonwood', 

In [7]:
### SUCCESSFUL STATE-PULLING CODE
state = 'NOSTATE'
all_rows_bonus = [['name','city','state','contact']]
all_tables = soup_doc.find_all('section', limit=52)
for mytable in all_tables:
    if mytable.find('h3'):
        state = re.findall(r"(.*) Rock and Mineral Clubs", mytable.find('h3').string)[0]
    else:
        pass
    rows = mytable.find_all('tr')
    for row in rows[1:]:
        myrow = []
        info = row.find_all('td')
        for data in info:
            myrow.append(str(data.string))
        myrow.append(state)
        link = row.a['href']
        myrow.append(link)
        all_rows_bonus.append(myrow)
all_rows_bonus

[['name', 'city', 'state', 'contact'],
 ['Alabama Mineral & Lapidary Society',
  'Birmingham',
  'Alabama',
  'http://www.lapidaryclub.com/'],
 ['Dothan Gem & Mineral Club',
  'Dothan',
  'Alabama',
  'http://www.wiregrassrockhounds.com/'],
 ['Huntsville Gem and Mineral Society',
  'Huntsville',
  'Alabama',
  'http://huntsvillegms.org/'],
 ['Mobile Rock & Gem Society',
  'Mobile',
  'Alabama',
  'http://www.mobilerockandgem.com/'],
 ['Montgomery Gem & Mineral Society',
  'Montgomery',
  'Alabama',
  'http://montgomerygemandmineralsociety.com/mgms/'],
 ['Chugach Gem & Mineral Society',
  'Anchorage',
  'Alaska',
  'http://www.chugachgemandmineralsociety.com/'],
 ['Mat-Su Rock and Mineral Club',
  'Palmer',
  'Alaska',
  'http://matsurockclub.com/'],
 ['Apache Junction Rock and Gem Club',
  'Apache Junction',
  'Arizona',
  'http://www.ajrockclub.com/'],
 ['Black Canyon City Rock Club',
  'Black Canyon City',
  'Arizona',
  'http://www.bccrockclub.mysite.com/'],
 ['Daisy Mountain Rock &

In [8]:
with open('08-rock-and-mineral-clubs-cleaned.csv', 'w') as outputfile:
    writer = csv.writer(outputfile)
    writer.writerows(all_rows_bonus)

In [9]:
df = pd.read_csv("08-rock-and-mineral-clubs-cleaned.csv")
df.head()

Unnamed: 0,name,city,state,contact
0,Alabama Mineral & Lapidary Society,Birmingham,Alabama,http://www.lapidaryclub.com/
1,Dothan Gem & Mineral Club,Dothan,Alabama,http://www.wiregrassrockhounds.com/
2,Huntsville Gem and Mineral Society,Huntsville,Alabama,http://huntsvillegms.org/
3,Mobile Rock & Gem Society,Mobile,Alabama,http://www.mobilerockandgem.com/
4,Montgomery Gem & Mineral Society,Montgomery,Alabama,http://montgomerygemandmineralsociety.com/mgms/


In [10]:
df.shape

(484, 4)