### Introduction
Web scraping justia.com using Python and BeautifulSoup. The goal is to obtain lawyers info from New Jersey, that is the name, short bio, specialization, university attended, address, phone and email address from the first page and store the data in a .csv file

#### Imports

In [1]:
#load libraries
from bs4 import BeautifulSoup
import requests
import time
import pandas as pd

In [2]:
#header
headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'}
#get url
website='https://www.justia.com/lawyers/new-jersey'
#get request
response=requests.get(website, headers=headers)
#status code
response

<Response [200]>

In [3]:
#soup object
soup=BeautifulSoup(response.content, 'html.parser')

In [4]:
#Results
results=soup.find_all('div',{'data-vars-action':'OrganicListing'})
len(results)

40

In [5]:
#create empty lists for the target data
name=[]
bio=[]
specialization=[]
university=[]
address=[]
phone=[]
email=[]
#loop through the results and append the data to the list
for result in results:
    #name
    try:
        name.append(result.find('strong',{'class':'lawyer-name'}).get_text().strip())
    except:
        name.append('')
    #bio
    try:
        bio.append(result.find('div', {'class':'lawyer-expl'}).get_text().strip())
    except:
        bio.append('')
    #specialization
    try:
        specialization.append(result.find('span', {'class': '-practices'}).get_text())
    except:
        specialization.append('')
    #university
    try:
        university.append(result.find('span', {'class':'-law-schools'}).get_text())
    except:
        university.append('')
    #address
    try:
        address.append(result.find('span', {'class':'-address'}).get_text().strip().replace('\t','').replace('\n',''))
    except:
        address.append('')
    #phone
    try:
        phone.append(result.find('strong',{'class':'-phone'}).get_text().strip())
    except:
        phone.append('')
    #email
    try:
        email.append(result.find('a', {'class':'-email'}).get('href'))
    except:
        email.append('')
#create a Pandas dataframe to store the output
df=pd.DataFrame({'Lawyer_name':name,'Bio':bio,'Specialization':specialization,'University':university,'Address': address,
                 'Phone':phone,'Email':email})
df.head()

Unnamed: 0,Lawyer_name,Bio,Specialization,University,Address,Phone,Email
0,Raymond Lahoud,"Bridgewater, NJ Lawyer with 12 years of experi...",Immigration,Georgetown University Law Center,"400 Crossing Blvd.8th FloorBridgewater,NJ 08807",(888) 440-4872,https://lawyers.justia.com/lawyer/raymond-laho...
1,Jonathan D. Marx,"Marlton, NJ Attorney with 36 years of experience",Personal Injury,Brooklyn Law School,"10000 Lincoln Dr ESuite #201Marlton,NJ 08053",(856) 671-1529,https://lawyers.justia.com/lawyer/jonathan-d-m...
2,Emmanuel Coffy,10.0 (1 Peer Review),IP,Seton Hall University School of Law,"515 Valley StreetSuite 1Maplewood,NJ 07040",(800) 576-4320,https://lawyers.justia.com/lawyer/emmanuel-cof...
3,Jason Seidman,"Toms River, NJ Attorney with 12 years of exper...","Criminal, DWI, Juvenile and Traffic Tickets",University of the District of Columbia and Laf...,"10 Allen St #2DToms River,NJ 08753",(732) 279-7649,https://lawyers.justia.com/lawyer/jason-seidma...
4,Matthew J Hartnett,"Collingswood, NJ Lawyer",Immigration,Rutgers School of Law-Camden,"1007 Haddon AvenueCollingswood,NJ 08108",(215) 437-0264,https://lawyers.justia.com/lawyer/matthew-j-ha...


In [6]:
# store output in a .csv file
df.to_csv('nj_lawyers.csv', index=False)