# Web Scraping- Scrape data of Psychiatrists available in Pune district
                                              Submitted by: Rupali Jain
                                              email: rupalisumatijain@gmail.com
We will scrape data of 50 Psychiatrists available in Pune district from **www.practo.com** and export it to an excel sheet.
Note: The data of doctors is static (not real time) and is subject to change on website.

The data includes:
1. Doctor's name
2. Hospital/Clinic name
3. Locality

Note: Doctor's contact number, e-mail and website not available on Practo


### Step 1: Scrape data from website

We will first run Filters on the website to get a list of all Psychiatrists in Pune location. This results in 8 pages of data. So we need to run the desired code on each of these 8 pages. 

Then using class names of each tag from html and css source code, we will filter out our desired results.

In [1]:
#import necessary libraries

import requests #to request html page code
import bs4 #to format html code

In [2]:
#for loop for each of 8 pages of result urls
urls=[]
for i in range(1,9):
    urls.append(' https://www.practo.com/pune/psychiatrist?page={}'.format(i))

In [3]:
#initialising empty lists for each column
doctornames=[]
clinicnames=[]
locality=[]

#for loop to get data from each of 8 pages of desired results on the website
for url in urls:
    practo=requests.get(url)
    soup=bs4.BeautifulSoup(practo.text,'lxml')
    
    #make a list of all doctor's name
    alldoctornames=soup.select('.doctor-name')
    for doctor in alldoctornames:
        doctornames.append(doctor.getText())
        
    #make a list with clinic/hospitals names corrsponding to doctors
    allclinicdump=soup.select('.u-c-pointer.u-t-hover-underline')
    clinicnamestemp=[]
    for data in allclinicdump:
        clinicnamestemp.append(data.getText()) #this gets some unwanted data, requires cleaning
    for clinic in clinicnamestemp[1:]:
        if 'more' not in clinic:
            clinicnames.append(clinic)
            
    #make a list of each doctor's precise location
    alllocality=soup.select('.u-bold.u-d-inlineblock.u-valign--middle')
    for place in alllocality:
        locality.append(place.getText().split(',')[0])

In [4]:
len(doctornames)

72

In [5]:
doctornames

['Dr. Roshita Khare',
 'Dr. Rajesh Nalawade',
 'Dr. Pankaj B Borade',
 'Dr. Vejaya Goyal',
 'Dr. Anjendra R Targe',
 'Dr. Ashwini Kulkarni',
 'Dr. Ninad Baste',
 'Dr. Dnyanda Nakul Deshpande',
 'Dr. Yogesh Pokale',
 'Dr. Bhushan Chaudhari',
 'Dr. Prakash Bhambure',
 'Dr. Anjendra R Targe',
 'Dr. Ashwini Kulkarni',
 'Dr. Manish Bajpayee',
 'Dr. Samiksha Murkute',
 'Dr. Amod Borkar',
 'Dr. Rishikesh Behere',
 'Dr. Niket Kasar',
 'Dr. Trupti Vedpathak',
 'Dr. Ninad Baste',
 'Dr. Rashmi Patil',
 'Dr. Sandeep Mahamuni',
 'Dr. Trupti Vedpathak',
 'Dr. Sumit Chandak',
 'Dr. Neha Gupta',
 'Dr. Bhushan Mhetre',
 'Dr. Nikhil Kanase',
 'Dr. Dhananjay Vasant Neel',
 'Dr. Vinayak Jarhad',
 'Dr. Jaideep Patil',
 'Dr. Nikhil Mankar',
 'Dr. Sonia Malhotra',
 'Dr. Swati Joshi',
 'Dr. Swapnil Deshmukh',
 'Dr. Vrushali S. Shenoy',
 'Dr. M S V K Raju',
 'Dr. Bharat Sarode',
 'Dr. Ananya Chitale',
 'Dr. Ankit Patel',
 'Dr. Gaurav Wadgaonkar',
 'Dr. Ramdas Ransing',
 'Dr. Kishor V. Jadhavar',
 'Dr. Sadashiv

In [6]:
len(clinicnames)

72

In [7]:
clinicnames

['Mnas Clinic',
 'Sexology and Stress Management Clinic',
 'Mind Matters Clinic',
 'Chaitanya Nursing Home',
 'A De-Stress Mind and Sex Clinic',
 'Advait Mindwin Clinic',
 'Jupiter Hospital',
 'Smart Psychiatry Institute',
 'Mann Swasthya Psychiatry Clinic',
 'Dr. Bhushan Chaudhari Clinic',
 'Dr Prakash Bhambure Clinic',
 'A De-Stress Mind and Sex Clinic',
 'Advait Mindwin Clinic',
 'Dr. Manish Bajpayee Clinic',
 'Manomay Homeopathy and Counselling Center',
 'Formative Minds Clinic',
 'Manoshanti',
 'Dr Niket Kasar Clinic',
 'Psychwellness Clinic',
 'Jupiter Hospital',
 'Orchid Hospital',
 'Mahamuni Clinic',
 'Psychwellness Clinic',
 'The Beautiful Mind',
 'Bharti Children & Cardio Diabetic Hospital',
 'We Heal Polyclinic',
 'Doctor To Home & Clinic',
 'Nihar Dental Clinic',
 'Oasis Counsellors',
 'Mind Management Center',
 'We Heal Polyclinic',
 'KEM Hospital',
 'Sneh Psychiatry Clinic',
 'Smart Neuro-psychiatry Clinic',
 'Healthbox Multispeciality Clinic',
 'Prashanti Clinic',
 'Aast

In [8]:
len(locality)

72

In [9]:
locality

['Baner',
 'Nigdi',
 'Camp',
 'Vishrantwadi',
 'Kalyani Nagar',
 'Aundh T.S.',
 'Baner',
 'Bund Garden',
 'Bhosari',
 'Pimpri-Chinchwad',
 'Satara Road',
 'Kalyani Nagar',
 'Aundh T.S.',
 'Kondhwa',
 'Pimple Saudagar',
 'Bavdhan',
 'Baner',
 'Tilak Road',
 'Baner',
 'Baner',
 'Lohegaon',
 'Tilak Road',
 'Baner',
 'Baner',
 'Pimple Saudagar',
 'Warje',
 'Pimple Saudagar',
 'Kharadi',
 'Sangavi',
 'Pimple Gurav',
 'Warje',
 'Rasta Peth',
 'Kothrud',
 'Bund Garden',
 'Pimple Saudagar',
 'Undri',
 'Pimpri-Chinchwad',
 'Pimple Saudagar',
 'Koregaon Park',
 'Pimple Saudagar',
 'Shikrapur',
 'Hadapsar',
 'Viman Nagar',
 'Kharadi',
 'Pimpri-Chinchwad',
 'Yerwada',
 'Kothrud',
 'Prabhat Road',
 'Katraj',
 'Dhole Patil Road',
 'Aundh',
 'Wadgaon Sheri',
 'Aundh T.S.',
 'Baner Road',
 'Sadashiv Peth',
 'Kondhwa',
 'Aundh',
 'Bavdhan',
 'Ambegaon',
 'Deccan Gymkhana',
 'Karve Nagar',
 'Pimpri-Chinchwad',
 'Nigdi',
 'Wakad',
 'Narayangaon',
 'Bund Garden',
 'Kalyani Nagar',
 'Talegaon',
 'Aundh',
 

### Step 2: Export data to Excel sheet

We will create a Data Frame with columns from each of the lists (doctornames, clinicnames, locality) and export this data frame to an excel sheetnamed 'Psychiatrists.xlsx'

In [10]:
#import pandas libraries for dataframe
import pandas as pd

In [11]:
#create a dataframe
psydoc=pd.DataFrame()

In [12]:
#add ach of lists we created in Step 1, as columns in the dataframe 'psydoc'
psydoc["Doctor's name"]=doctornames
psydoc['Hospital/Clinic']=clinicnames
psydoc['Locality']=locality

In [13]:
#ideal data frame contains total of 72 entries from site wide, we cut it down to 50 entries
psydoc[:50]

Unnamed: 0,Doctor's name,Hospital/Clinic,Locality
0,Dr. Roshita Khare,Mnas Clinic,Baner
1,Dr. Rajesh Nalawade,Sexology and Stress Management Clinic,Nigdi
2,Dr. Pankaj B Borade,Mind Matters Clinic,Camp
3,Dr. Vejaya Goyal,Chaitanya Nursing Home,Vishrantwadi
4,Dr. Anjendra R Targe,A De-Stress Mind and Sex Clinic,Kalyani Nagar
5,Dr. Ashwini Kulkarni,Advait Mindwin Clinic,Aundh T.S.
6,Dr. Ninad Baste,Jupiter Hospital,Baner
7,Dr. Dnyanda Nakul Deshpande,Smart Psychiatry Institute,Bund Garden
8,Dr. Yogesh Pokale,Mann Swasthya Psychiatry Clinic,Bhosari
9,Dr. Bhushan Chaudhari,Dr. Bhushan Chaudhari Clinic,Pimpri-Chinchwad


Now we export this dataframe 'psydoc' to 'Psychiatrists.xlsx'

In [15]:
#export data to excel
psydoc[:50].to_excel('Psychiatrists.xlsx',sheet_name='Psydoc',index=False)

Link to Google sheet with data:
<*https://docs.google.com/spreadsheets/d/1s5d5MR7x6zN_nEKeUkB4JbMiySt_dp2SWdzHeuS1r8I/edit?usp=sharing*>