Skip to content

Quotchen/usw-code-2025

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

USW SoSe 2025

Code for lecture USW 2025

Prerequisites

Every folder contains exercises related to the PCÜ sessions throughout the semester. Create a venv by running python -m venv venv, activate (source venv/bin/activate on WSL/Linux) and install all requirements for the exercises (pip install -r requirements.txt).

Activate the virtual environment on Windows with PowerShell:

.\venv\Scripts\Activate.ps1

or on Windows with Command Prompt:

.\venv\Scripts\activate.bat

Every exercise that uses Jupyter Notebooks requires the Jupyter server to run locally. Start the server with the following command:

jupyter notebook

PCÜ 1: SPAM Detection

In this first exercise, we develop a supervised machine learning model for spam detection, based on the following dataset: SMS SPAM Collection.

This excercise will be carried out with Jupyter Notebooks.

PCÜ 2: News and Web Crawling

This exercise uses the scrapy framework. Find more information about the architecture of the framework here.

Also, in this tutorial we cover website rendering with JavaScript.

Javascript rendering requires to install a headless browser. Therefore

  • Run pip install -r requirements.txt again
  • On linux: sudo playwright install-deps (on Windows: playwright install-deps)
  • Install headless chromium using playwright install chromium

Once the installation is finished, you are ready to run the code.

  1. Go into the directory 02_web_and_news_scraping/scrapy_tutorial
  2. Run scrapy crawl htw_berlin in the command prompt/bash

About

Code for lecture USW 2025

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 94.0%
  • Python 6.0%