# Nuclear Market Scrape

Finding opportunities in the specialised field of Nuclear Monitoring is a little different from the normal marketing campaigns of large software providers or cloud hosting giants but there is one place a manufacturer could scour to find potential new business: https://www.nuclearmarket.com/.

Unfortunately, many things fell under the radar and it was a full-time job to keep up to date. So, during my time as a Sales Engineer I wrote this script which would scrape all relevant opportunities based on our search criteria which I could then validate and send to our sales team. 

Key Demonstration: Scraping web data, working with html tagging, cleaning and filtering data, regular expressions, lambda functions

In [37]:
import re
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
pd.set_option('display.max_colwidth', -1)

#Grab HTML from website
response = requests.get("https://www.nuclearmarket.com/proc/ListProc1.cfm")

#import into HTML manager BeautifulSoup
soup = BeautifulSoup(response.text, "lxml" )

#Create empty DataFrame (table)
df = pd.DataFrame(np.empty(60))

#Find all HTML with HTML tag "a", these are the opportinuty titles, store in our DataFrame
for counter, i in enumerate(soup.find_all("a")):
   df.loc[counter,0] = str(i.contents)

#convert to upper for cleaning
df_upper = df.apply(lambda x: x.astype(str).str.upper())

#What title keywords do we want to filter for?
keywords = ['DOSIMETERS','DETECTOR','RADIATION','MONITORING','GEIGER','FISSION','CHAMBER','MONITOR','ULTRA','TUBE','CABLE','NEUTRON','ALPHA','BETA','GAMMA','PHOTON']

find = df_upper[0].str.contains('|'.join(keywords))
Centronic_opportinuties = df_upper[find]

#clean HTML
def cleanhtml(raw_html):
  cleanr = re.compile('<.*?>')
  cleantext = re.sub(cleanr, '', raw_html)
  return cleantext

Centronic_opportinuties = Centronic_opportinuties.iloc[:,0].apply(cleanhtml)

#print opportinuties 
Centronic_opportinuties

20    [RADIATION EXPERIMENT KIT FOR NUCLEAR SCIENCE TECHNOLOGY EDUCATION (10 COUNTRIES)]                                                                                                                                       
23    [HAND-FOOT-CLOTHES CONTAMINATION MONITORS]                                                                                                                                                                               
29    [PLASMA EQUIPMENT MONITORING SYSTEM DEVELOPMENT, I&AMP;C, DATA ACQUISITION SYSTEM]                                                                                                                                       
36    [DEDICATED MULTICHANNEL ANALYZER &AMP; SHIELDED SODIUM IODIDE DETECTOR]                                                                                                                                                  
37    [MULTI CHANNEL ANALYSER [TO BE USED IN CONNECTION TO HIGH PURITY GERMANIUM, SILICON, CADMIUM ZINC 