GitHub - abdoulfataoh/lefaso-net-scraper: The ultimate library for data scientist to scrape data from https://www.lefaso.net

lefaso-net-scraper

Description

lefaso-net-scraper is a robust and versatile Python library designed to efficiently extract articles from the popular online news source of of Burkina Faso, www.lefaso.net. This powerful scraping tool allows users to effortlessly collect article content and data from Internet users’ comments on lefaso.net.

Important

Our scraper, like other scrapers, is based on the structure of the target website. Changes to the website's structure can affect the scraper. We use automated workflows to detect these issues frequently, but we cannot catch all of them. Please report any issues you encounter and use the latest version.

Data Format

Field	Description
article_topic	article topic
article_title	article title
article_published_date	article published date
article_origin	article origin
article_url	article url
article_content	article content
article_comments	article comments

Installation

With poetry

poetry add lefaso-net-scraper
poetry update lefaso-net-scraper  # to update the package

With pip

pip install --upgrade  lefaso-net-scraper

Usage

# coding: utf-8

from lefaso_net_scraper import LefasoNetScraper

section_url = 'https://lefaso.net/spip.php?rubrique473'
scraper = LefasoNetScraper(section_url)
data = scraper.run()

Settings Pagination range

# coding: utf-8

from lefaso_net_scraper import LefasoNetScraper

section_url = 'https://lefaso.net/spip.php?rubrique473'
scraper = LefasoNetScraper(section_url)
scraper.set_pagination_range(start=20, stop=100)
data = scraper.run()

Save data to csv

# coding: utf-8

from lefaso_net_scraper import LefasoNetScraper
import pandas as pd

section_url = 'https://lefaso.net/spip.php?rubrique473'
scraper = LefasoNetScraper(section_url)
data = scraper.run()
df = pd.DataFrame.from_records(data)
df.to_csv('path/to/df.csv')

We ❤ open source

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
docs		docs
lefaso_net_scraper		lefaso_net_scraper
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

lefaso_net_scraper

lefaso_net_scraper

tests

tests

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

lefaso-net-scraper

Description

Important

Data Format

Installation

Usage

About

Releases 11

Packages

Languages

abdoulfataoh/lefaso-net-scraper

Folders and files

Latest commit

History

Repository files navigation

lefaso-net-scraper

Description

Important

Data Format

Installation

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages