## Selenium

If the page you are scraping from is generated dynamically, you will need to use a more advanced library like [**Selenium**](https://www.selenium.dev/documentation/webdriver/). This software is actually designed for **testing** websites by automating a process a human would otherwise need to do manually - use the website and look for bugs. It can click buttons, open menus, scroll, navigate, essentially trigger any event that a user would do while visiting the website and make sure everything performs as expected.

Imitating a user's journey through the website is how we will access the data we need, since many JavaScript functions won't fire until triggered by a user event, and we want the data they return! 

In [1]:
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()

In [2]:
driver.quit()

In [3]:
driver = webdriver.Chrome()
driver.get('https://example.com/')
print(driver.page_source)

<html><head>
    <title>Example Domain</title>

    <meta charset="utf-8">
    <meta http-equiv="Content-type" content="text/html; charset=utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustr

In [4]:
driver.quit()

In [5]:
driver = webdriver.Chrome()
driver.get('https://lms.codeacademyberlin.com/')
# manually log in


In [6]:
from selenium.webdriver.common.by import By

posts = driver.find_elements(By.CLASS_NAME, "MuiPaper-root.MuiCard-root.sc-ikkxIA.iXIOzD.MuiPaper-elevation1.MuiPaper-rounded")
posts

[<selenium.webdriver.remote.webelement.WebElement (session="03dd890e8858e0ceb6406ad4d6fbf278", element="f.FA65FFB0DCDC879E42DA3D8881A80D79.d.BF3CB2647B814F35FDBF2A31A151EDAC.e.78")>,
 <selenium.webdriver.remote.webelement.WebElement (session="03dd890e8858e0ceb6406ad4d6fbf278", element="f.FA65FFB0DCDC879E42DA3D8881A80D79.d.BF3CB2647B814F35FDBF2A31A151EDAC.e.92")>,
 <selenium.webdriver.remote.webelement.WebElement (session="03dd890e8858e0ceb6406ad4d6fbf278", element="f.FA65FFB0DCDC879E42DA3D8881A80D79.d.BF3CB2647B814F35FDBF2A31A151EDAC.e.100")>,
 <selenium.webdriver.remote.webelement.WebElement (session="03dd890e8858e0ceb6406ad4d6fbf278", element="f.FA65FFB0DCDC879E42DA3D8881A80D79.d.BF3CB2647B814F35FDBF2A31A151EDAC.e.101")>,
 <selenium.webdriver.remote.webelement.WebElement (session="03dd890e8858e0ceb6406ad4d6fbf278", element="f.FA65FFB0DCDC879E42DA3D8881A80D79.d.BF3CB2647B814F35FDBF2A31A151EDAC.e.102")>,
 <selenium.webdriver.remote.webelement.WebElement (session="03dd890e8858e0ceb6406a

In [7]:
poster = []
titles = []
content = []

for post in posts:
  # print("running")
  poster.append(post.find_element(By.CSS_SELECTOR, "span.MuiTypography-displayBlock").text)
  titles.append(post.find_element(By.CSS_SELECTOR, "h2.MuiTypography-root.sc-kpDqfm.jflsKu.MuiTypography-h2").text)
  try:
    content.append(post.find_element(By.CSS_SELECTOR, "div.wmde-markdown.wmde-markdown-color ").text)
  except:
    content.append("no content")

posts_df = pd.DataFrame({
  'title': titles,
  'posted_by': poster,
  'content': content
})

posts_df

Unnamed: 0,title,posted_by,content
0,Black Owls Graduation!,Lucas,"Tomorrow, we celebrate the graduation of the B..."
1,"Upcoming workshop ""Start your Career in Tech"" ...",Jost,Join us for an inspiring and relaxed evening w...
2,AI Meetup!,Emily,Code Academy will be hosting a meetup for AI e...
3,Don't miss tomorrow's Purple Cat graduation,Jost,
4,CAB Pub Quizz - 20.11.2024,Lucas,
5,Movie Night!,Emily,Join us for a Movie Night in the CAB Kino on O...
6,Give your review of CAB!,Lucas,Help us by sharing your experience at Code Aca...
7,The Academy loves you!,Lucas,


In [8]:
# make a new comment then delete it
inputs = posts[0].find_elements(By.TAG_NAME, "input")
# input.send_keys('testing')
inputs[1].send_keys("testing")

In [9]:
submit_button = posts[0].find_element(By.CSS_SELECTOR, 'button[type="submit"]')
submit_button.click()

In [10]:
comments_list = posts[0].find_elements(By.CLASS_NAME, "MuiListItem-container")
delete_button = comments_list[-1].find_element(By.CSS_SELECTOR, 'button[aria-label="delete"]')
delete_button.click()


In [11]:
driver.quit()