## Introduction to Selenium: An overview of what Selenium is, what it can do, and its use cases.

I am excited to introduce you to Selenium, which opens up new possibilities for web scraping.

When you want to extract data from a webpage that doesn't require user interaction, you can use the Requests library to send an HTTP request to the webpage and retrieve the HTML content. You can then use Beautiful Soup to parse the HTML and extract the relevant information. However, if the webpage requires user interaction, such as clicking a button or filling out a form, you'll need to use a tool like Selenium to automate these interactions and scrape the data. So, you can use Requests and Beautiful Soup for static webpages that don't require user interaction, and but you need to use Selenium for dynamic webpages that require user interaction.

Let's say you want to scrape a website that requires you to log in before you can access the content you want to scrape. In this case, you would need to use Selenium to automate the login process and scrape the data. Using Requests and Beautiful Soup won't work in this case, as you need to submit a login form with your username and password, which requires user interaction. With Selenium, you can automate the login process by navigating to the login page, entering your credentials, and submitting the login form. Once you're logged in, you can then use Selenium to navigate to the pages with the content you want to scrape and use Beautiful Soup to extract the relevant information.


#### Project Possibilities
We provide you with some specific ideas of the possibilities that Selenium can open up in your investigative research

1. Investigating government corruption: To investigate government corruption in Iran, a journalist might use Selenium to scrape government websites and procurement portals to collect information on government contracts, bids, and transactions. Selenium could be used to simulate user interactions with these websites, such as navigating through menus, filling out search forms, and downloading data files.

2. Mapping social and political movements: To map social and political movements in Iran, a journalist might use Selenium to scrape social media platforms, such as Twitter or Facebook, to collect data on user activities, hashtags, and mentions. Selenium could be used to simulate user interactions with these platforms, such as logging in, searching for keywords, and scrolling through feeds.

3. Tracking environmental issues: To track environmental issues in Iran, a journalist might use Selenium to scrape websites that provide environmental data, such as weather forecasts or air quality reports. Selenium could be used to simulate user interactions with these websites, such as selecting locations, dates, and types of data.

4. Analyzing economic trends: To analyze economic trends in Iran, a journalist might use Selenium to scrape financial data from websites, such as stock market data or economic indicators. Selenium could be used to simulate user interactions with these websites, such as selecting dates, time periods, and types of data.

#### Setting up the Selenium environment: How to install and configure the Selenium framework, including the Selenium WebDriver, which allows you to automate interactions with a web page.


You can install Selenium using pip, which is Python's package manager. We use the following command:

In [None]:
!pip install selenium

Collecting selenium
  Downloading selenium-4.9.1-py3-none-any.whl (6.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.6/6.6 MB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting trio~=0.17
  Downloading trio-0.22.0-py3-none-any.whl (384 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m384.9/384.9 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting trio-websocket~=0.9
  Downloading trio_websocket-0.10.2-py3-none-any.whl (17 kB)
Collecting async-generator>=1.9
  Using cached async_generator-1.10-py3-none-any.whl (18 kB)
Collecting exceptiongroup>=1.0.0rc9
  Downloading exceptiongroup-1.1.1-py3-none-any.whl (14 kB)
Collecting outcome
  Downloading outcome-1.2.0-py2.py3-none-any.whl (9.7 kB)
Collecting wsproto>=0.14
  Downloading wsproto-1.2.0-py3-none-any.whl (24 kB)
[0mInstalling collected packages: wsproto, outcome, exceptiongroup, async-generator, trio, trio-websocket, selenium
Successfully inst

Download a WebDriver: To interact with a web browser, you need to download a WebDriver that corresponds to the browser you want to use. There are different WebDriver implementations for different browsers like Chrome, Firefox, and Edge. In this guide, we'll be using the Chrome WebDriver.

You can download the Chrome WebDriver from the official website: https://sites.google.com/a/chromium.org/chromedriver/downloads.

Download the version that corresponds to the version of Chrome you have installed on your computer. Once you've downloaded the WebDriver, extract it to a folder on your computer.

Set up the environment: In your Python code, you need to import the Selenium WebDriver module and create an instance of the WebDriver. Here's an example code snippet:


In [None]:
from selenium import webdriver

# Path to the Chrome WebDriver executable
chrome_driver_path = "path-to-webdriver"

# Create a new instance of the Chrome WebDriver
driver = webdriver.Chrome(chrome_driver_path)

# Navigate to a webpage
driver.get("https://www.google.com")

# Close the browser window
driver.quit()

  driver = webdriver.Chrome(chrome_driver_path)


In this example, the WebDriver navigates to the Google homepage, and then the driver.quit() method is called to close the browser window. Replace the chrome_driver_path variable with the path to the Chrome WebDriver executable that you downloaded in step 3.

That's it! You now have a basic setup for using Selenium in Python. From here, you can use the WebDriver to interact with a webpage, perform user actions, and extract data.

#### How to use Selenium to do amazing web scraping!

We now provide an example of scraping with Selenium. Scraping with Selenium involves three steps:
1.   Set up a WebDriver object to use to navigate the website.
2.   Use the WebDriver to navigate to the website.
3.   Identify the steps needed to extract the desired information from the webpage. This includes identifying which clicks and other interactions to make and which specific pieces of data to extract.
4.   Inspect the HTML/CSS of the webpage to identify the page elements associated with the interactions in step 3. Based on that knowledge, use Selenium to automate interactions with the site.

In this example, we use Selenium to execute a search on the Divar site and extract the search results on the first page. The first step is to set up the WebDriver object as described above. If we'd like, we can run the browser in "headless" mode so that we never see it.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

# Set up options for the Chrome WebDriver
options = webdriver.ChromeOptions()
# options.add_argument('--headless')  # Runs Chrome in headless mode

# Path to the Chrome WebDriver executable
chrome_driver_path = 'path-to-webdriver'

# Create a new instance of the Chrome WebDriver
driver = webdriver.Chrome(chrome_driver_path, options=options)

  driver = webdriver.Chrome(chrome_driver_path, options=options)


Now we navigate to the site:

In [None]:
# Navigate to the Tasnim News Agency website
driver.get("https://divar.ir/s/tehran")

Now that we've navigated to the Divar site, we need to determine the page element associated with the search bar so that we can execute a search. We can do this by right-clicking on the search and clicking "Inspect Element." We look for tags such as \<input\>, \<form\>, or specific classes or IDs associated with the search bar. The search bar element might also have a placeholder attribute or specific styling properties. We find that there are two \<input\> tags and the class attribute of the \<input\> element is set to "kt-nav-text-field__input". Therefore, we can use the class name of the input to identify it and input the desired information:

In [None]:
search_input = driver.find_element(by=webdriver.common.by.By.CLASS_NAME, value="kt-nav-text-field__input")
search_input.send_keys("hello")

Now that we've inputted our search query, we can press the "return" key and wait up to 10 seconds for the search results to load.

In [None]:
search_input.send_keys(Keys.RETURN)
# Wait for the search results to load
wait = WebDriverWait(driver, 10)

To extract the search results on the first page, we need to determine the page elements associated with them. We find that 
\<h2 class="kt-post-card__title">ست آچار کیانلی مدل Hello Phillips\</h2\> contains the title of one of the search results.

We can try identifying all text associated with the class="kt-post-card__title" and see if that captures the desired information.

In [None]:
elements = driver.find_elements(by=webdriver.common.by.By.CLASS_NAME, value = "kt-post-card__title")

Now, we extract text from each of the elements and display it.

In [None]:
text_list = [element.text for element in elements]

In [None]:
text_list

['پک کامل دستگاه تتو پن Fk hello با تمامی تجهیزات',
 '',
 'ست آچار کیانلی مدل Hello Phillips',
 '',
 'کالسکه Hello baby کم کارکرد',
 '',
 'هشداردهنده صوتی Hello well com',
 '',
 'ساعت چرمی Hello kitty',
 '',
 'دستگاه تتو پن مدل HELLO زاین',
 '',
 'فلاسکhello dream',
 '',
 'لباس ۳ تیکه نوزادی برند ( Hello beyby )',
 '',
 'لباس سرهمی نوزادی پسرانه برند ( Hello beyby )',
 '',
 'لباس سرهمی نوزادی برند ( Hello beyby )',
 '',
 'عروسک سگ بالشتی طرح hello',
 '',
 'تی شرت آستین کوتاه و شلوار جنس ترک طرح Hello',
 '']

Hooray! We see that we have obtained a list of titles from the webpage.

You now have a taste of the power of Selenium. I hope that this opens the door to powerful and creative projects for you!