- **Web Scraping Setup and Execution**: The code sets up a Selenium WebDriver using ChromeDriver to open a specific URL (`https://discourse.onlinedegree.iitm.ac.in/u?order=likes_received&period=all`). It waits for the page to load completely and then scrolls down the page to load more user details dynamically until no new content is loaded.

- **Data Extraction**: The code finds all `<div>` elements with the class `user-detail` and extracts the `username`, `name`, and `title` for each user. It specifically checks if the `title` contains the word "course" (case-insensitive) before extracting the details.

- **Data Storage and Output**: The extracted data is stored in a dictionary and then converted into a Pandas DataFrame. Finally, the DataFrame is converted into a Markdown table and printed, allowing for easy viewing and further processing of the user details.

In [9]:
# %pip install selenium
# %pip install webdriver_manager
# %pip install pandas
# %pip install tabulate

Note: you may need to restart the kernel to use updated packages.Collecting tabulate
  Downloading tabulate-0.9.0-py3-none-any.whl.metadata (34 kB)
Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)
Installing collected packages: tabulate
Successfully installed tabulate-0.9.0




[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
!where python

d:\IITM ALL\ANALYTICS TEAM\myenv\Scripts\python.exe
C:\Users\Lenovo\AppData\Local\Microsoft\WindowsApps\python.exe


In [2]:
import sys
print(sys.executable)

d:\IITM ALL\ANALYTICS TEAM\myenv\Scripts\python.exe


In [4]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import time

# Set up the Chrome WebDriver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)
# Open the URL
url = "https://discourse.onlinedegree.iitm.ac.in/u?order=likes_received&period=all"
driver.get(url)

# Wait for the page to load completely
driver.implicitly_wait(100)

In [5]:
# Scroll down to load more user details
last_height = driver.execute_script("return document.body.scrollHeight")
count=0
while True:
    # Scroll down to the bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    
    # Wait for new content to load
    time.sleep(4)
    # count+=1
    # if count==2:
    #     break
    
    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

In [6]:
# Find all <div> tags with class "user-detail"
user_details = driver.find_elements(By.CLASS_NAME, 'user-detail')
dataframe = {"username":[], "name":[], "title":[]}

# Extract and print the details of each user
for user in user_details:
    # Check if the title is present
    title_element = user.find_elements(By.CLASS_NAME, 'title')
    title = title_element[0].text
    if "course" in title.lower():
        # Extract the username and name
        username = user.find_element(By.CLASS_NAME, 'username').text
        name = user.find_element(By.CLASS_NAME, 'name').text
        # Append the details to the dataframe
        dataframe["username"].append(username)
        dataframe["name"].append(name)
        dataframe["title"].append(title)

df = pd.DataFrame(dataframe)
df.shape

(123, 3)

In [7]:
df.to_csv("instructors_TAs.csv", index=False)

In [8]:
len(user_details)

3801

In [10]:
# Convert DataFrame to Markdown table
markdown_table = df.to_markdown(index=False)
print(markdown_table)


| username          | name                     | title          |
|:------------------|:-------------------------|:---------------|
| Karthik_POD       | Karthik Thiagarajan      | Course_Team    |
| AbhishekPOD       | Abhishek                 | Course_Team    |
| AtulPS            | ATUL PRATAP SINGH        | Course_Team    |
| Nikita            | Nikita Kumari            | Course_Team    |
| santhanakrishnan  | Santhana Krishnan S      | Course_Team    |
| Milo              | Dr. Malolan Sundararaman | Course_Team    |
| PiyushW           | Piyush Wairale           | Course_Team    |
| Nitin_Jha         | Nitin Kumar Jha          | Course_Team    |
| AdarshMadre       | Adarsh Madre             | Course_Team    |
| sushmitha         | Sushmitha P              | Course_Team    |
| Omkar_Joshi       | Omkar Joshi              | Course_Team    |
| jimmi             | Jimmi Kumar Bharti       | Course_Team    |
| subhasis          | Subhasis                 | Course_Team    |
| carlton 

In [11]:
# Close the browser
driver.quit()