# Python Project - Web Scrapping Data from Instagram Automatically with Selenium Library & View Someones Profile

## Author - Tanmaya Sekhar Swain 

## Selenium Library - The selenium package is used to automate web browser interaction from Python.

##### Selenium is an open-source tool that automates web browsers. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, among others.

## Why learn Selenium Python ?
- **Open Source and Portable** – Selenium is an open source and portable Web testing Framework.
- **Combination of tool and DSL** – Selenium is combination of tools and DSL (Domain Specific Language) in order to carry out various types of tests.
- **Easier to understand and implement** – Selenium commands are categorized in terms of different classes which make it easier to understand and implement.
- **Less burden and stress for testers** – As mentioned above, the amount of time required to do testing repeated test scenarios on each and every new build is reduced to zero, almost. Hence, the burden of tester gets reduced.
- **Cost reduction for the Business Clients** – The Business needs to pay the testers their salary, which is saved using automation testing tool. The automation not only saves time but gets cost benefits too, to the business.

# Objective Of the Project - AUTOMATIC IMAGE EXTRACTION

**For this webscrapping you need to download webdriver for the browser you want use to open instagram**

## Download ChromeDriver
### As I am using Chrome broswer, so I download ChromeDriver as selenium driver

Now we need to download latest stable release of ChromeDriver from:
https://chromedriver.chromium.org/

## Importing Important Libraries

In [1]:
# install selenium library
# !pip install selenium 
# connect the webdriver to the notebook
from selenium import webdriver 
from selenium.webdriver.common.keys import Keys 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import time # Time execution of a Python statement or expression.

## Open Instagram Page & Login to Your Instagram Account

### Open Instagram Page on Webdriver

In [2]:
#specify the path to chromedriver.exe (download and save on your computer)
driver = webdriver.Chrome('D:/Study/Projects/Instagram/chromedriver.exe')

#open the webpage
driver.get("http://www.instagram.com")

### Login to Instagram Account

#### Here we use CSS_SELECTOR
* The CSS Selector combines an element selector and a selector value that can identify particular elements on a web page. Like XPath, CSS selector can be used to locate web elements without ID, class, or Name.

In [3]:
#target username
username = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='username']")))
password = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[name='password']")))

#enter username and password
username.clear()
username.send_keys("my_username") #in my_username place need to give your username
password.clear()
password.send_keys("my_password") #in my_password place need to give your password

#target the login button and click it
login = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[type='submit']"))).click()


# Boom! We are loged in...

### Handle Alerts
* You might only get a single alert, or you might get 2 of them, please adjust the two cells below accordingly

In [4]:
time.sleep(5) # Wait for 5 seconds
# Now we will get a pop on about "save your login info?" You need to choose the button as per your requirement
# If we want to click "Not Now" option

#target the Not Now button and click it
notnow = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(text(), 'Not Now')]"))).click()

# If we want to click "Save Info" option

#target the Save Info button and click it
#saveinfo = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(text(), 'Save Info')]"))).click()


In [5]:
time.sleep(5) # Wait for 5 seconds
# Now we will get a pop on about "Turn on Notifications?" You need to choose the button as per your requirement
# If we want to click "Not Now" option

#target the Not Now button and click it
notnow1 = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(text(), 'Not Now')]"))).click()

# If we want to click "Turn On" option

#target the Turn On button and click it
#turnon = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(text(), 'Turn On')]"))).click()


### Cool!! Now we are on the feed page of instagram User Interface

## Click on the serach box to search for hastag

In [6]:
#target the search input field
searchbox = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//input[@placeholder='Search']")))
searchbox.clear()

#search for the hashtag "photography"
keyword = "#photography"
searchbox.send_keys(keyword)

In [7]:
# Target on the fisrt pop on i.e. on #photography & click it
time.sleep(5) # Wait for 5 seconds
my_link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@href, '/" + keyword[1:] + "/')]")))
my_link.click()

### Hurray!!! We are into the searched hastag

In [8]:
# Now if you want to follow that hastag 
time.sleep(5) # Wait for 5 seconds
follow = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(text(), 'Follow')]"))).click()

**Now we followed that hastag**

## Scroll Down

- Increase n_scrolls to select more photos (photos on each scroll depends on screen resolution)

In [9]:
#scroll down 3 times
#You can increase the range to scroll more
time.sleep(5) # Wait for 5 seconds
n_scrolls = 3
for j in range(0, n_scrolls):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(5)

In [10]:
#target all the link elements on the page
anchors = driver.find_elements_by_tag_name('a')
anchors = [a.get_attribute('href') for a in anchors]
#narrow down all links to image links only
anchors = [a for a in anchors if str(a).startswith("https://www.instagram.com/p/")]

print('Found ' + str(len(anchors)) + ' links to images')
anchors[:5]

Found 51 links to images


['https://www.instagram.com/p/CMkeTBoA7-6/',
 'https://www.instagram.com/p/CMkfOOSKtsX/',
 'https://www.instagram.com/p/CMkdSlWgKYm/',
 'https://www.instagram.com/p/CMkhOrIBlFg/',
 'https://www.instagram.com/p/CMkf1OiMY_D/']

In [11]:
images = []

#follow each image link and extract only image at index=1
for a in anchors:
    driver.get(a)
    time.sleep(5)
    img = driver.find_elements_by_tag_name('img')
    img = [i.get_attribute('src') for i in img]
    images.append(img[1])
    
images[:5]

['https://instagram.fblr6-1.fna.fbcdn.net/v/t51.2885-15/e35/p1080x1080/162108594_4396089807071385_7197034857243094626_n.jpg?tp=1&_nc_ht=instagram.fblr6-1.fna.fbcdn.net&_nc_cat=109&_nc_ohc=MlmDvRmzCz4AX_TwpG4&ccb=7-4&oh=8bbd6d2574a6de85b3606a13bd8acbb2&oe=607E13A9&_nc_sid=86f79a',
 'https://instagram.fblr6-1.fna.fbcdn.net/v/t51.2885-15/e35/p1080x1080/161509330_883314219093057_8670781648515280686_n.jpg?tp=1&_nc_ht=instagram.fblr6-1.fna.fbcdn.net&_nc_cat=109&_nc_ohc=i9RVEuKFdQUAX9lssW3&ccb=7-4&oh=a67a86e5c6b15e2976f183fb7b9b66be&oe=607BC6CD&_nc_sid=86f79a',
 'https://instagram.fblr6-1.fna.fbcdn.net/v/t51.2885-15/e35/161279226_441132863885382_5717541955847626570_n.jpg?tp=1&_nc_ht=instagram.fblr6-1.fna.fbcdn.net&_nc_cat=106&_nc_ohc=7xfRzfFd6cUAX9q73Js&ccb=7-4&oh=2ef3c6abec0eb9f0f99ab99226e4b2cc&oe=607EEC4E&_nc_sid=86f79a',
 'https://instagram.fblr6-1.fna.fbcdn.net/v/t51.2885-15/e35/p1080x1080/162406622_4071342549593740_7038571039023670429_n.jpg?tp=1&_nc_ht=instagram.fblr6-1.fna.fbcdn.net&_n

## Save images to computer
- First we'll create a new folder for our images somewhere on our computer. Then, we'll save all the images there.

In [12]:
import os
import wget

path = os.getcwd()
path = os.path.join(path, keyword[1:])

#create the directory
os.mkdir(path)

path

'D:\\Study\\Projects\\Instagram\\photography'

In [13]:

#download images
counter = 0
for image in images:
    save_as = os.path.join(path, keyword[1:] + str(counter)  + '.jpg')
    wget.download(image, save_as)
    counter += 1

100% [............................................................................] 209337 / 209337

# Bang!! Check the directory and we will find the downloaded image

## Click on the serach box to search for profile using profilename automatically

In [15]:
#target the search input field
searchbox = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//input[@placeholder='Search']")))
searchbox.clear()

#search for the hashtag "photography"
profilename= "sketch_listening__"
searchbox.send_keys(profilename)

In [18]:
# Target on the fisrt pop on i.e. on sketch_listening__ & click it
time.sleep(5) # Wait for 5 seconds
searchbox.send_keys(Keys.ENTER)
searchbox.send_keys(Keys.ENTER)

# Note - Check may you need to right the above command two times or one time

 # Conclusion - From the above set up codes we can automatically logged into instagram account and download posts against hashtag and we can search anyone's profile using username