LinkedIn Crawler

LinkedIn Crawler is a Python package that provides an easy way to extract data from LinkedIn profiles and company pages. It uses Selenium WebDriver and BeautifulSoup for web scraping and parsing the HTML content.

Installation

Install the required packages:

pip install -r requirements.txt

Create a .env file in the project directory and add your LinkedIn credentials:

LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password

Usage

LinkedIn_Crawler

The LinkedIn_Crawler class is the base class for crawling LinkedIn profiles and company pages.

from Crawler import LinkedIn_Crawler

crawler = LinkedIn_Crawler()
crawler.start_driver(headless=True)
crawler.login()

Companies

The Companies class inherits from LinkedIn_Crawler and provides methods for extracting company information.

from Companies import Companies

company_crawler = Companies()
company_data = company_crawler.get_all_data_by_company('https://www.linkedin.com/company/example/')

UserCrawler

The UserCrawler class inherits from LinkedIn_Crawler and provides methods for extracting user profile information.

from UserCrawler import UserCrawler

user_crawler = UserCrawler()
user_data = user_crawler.get_all_user_data('https://www.linkedin.com/in/example/', crawler.driver)

Classes

LinkedIn_Crawler

__init__(self, driver=None): Initializes the LinkedIn_Crawler object.
start_driver(self, headless=True): Starts the WebDriver with the specified options.
login(self): Logs in to LinkedIn using the credentials from the .env file.
handle_security_verification(self): Handles security verification if prompted.
get_soup(self): Returns a BeautifulSoup object of the current page source.
is_company(self, input_link): Checks if the input link is a company page.
create_link(self, input): Creates a LinkedIn company page link from the input.
date_utc(self): Returns the current date in UTC format.
load_posts_count(self, load=10): Loads the specified number of posts on a company page.
set_posts_filter(self): Sets the posts filter to "Recent".
crawl_posts(self): Crawls the posts on a company page.
select_posts(self): Selects the "Posts" tab on a company page.
load_posts(self): Loads the posts on a company page.
extract_data(self, container): Extracts data from a post container.
posts_crawler_process(self, posts_link): Crawls the posts on a company page using the specified link.

Companies

company_basics(self): Extracts basic company information.
get_all_data_by_company(self, input_link): Extracts all company data using the specified link.

UserCrawler

user_basic_info(self): Extracts basic user information.
user_edu_exp(self): Extracts user education and experience information.
get_user_experiences(self): Extracts user experiences.
get_all_user_data(self, input_link, driver): Extracts all user data using the specified link and WebDriver.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Companies.py		Companies.py
Crawler.py		Crawler.py
Individuals.py		Individuals.py
README.md		README.md
Sample_Lex-Fridman.json		Sample_Lex-Fridman.json
Sample_Microsoft.json		Sample_Microsoft.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinkedIn Crawler

Installation

Usage

LinkedIn_Crawler

Companies

UserCrawler

Classes

LinkedIn_Crawler

Companies

UserCrawler

License

About

Releases

Packages

Languages

DarienNouri/LinkedIn-API

Folders and files

Latest commit

History

Repository files navigation

LinkedIn Crawler

Installation

Usage

LinkedIn_Crawler

Companies

UserCrawler

Classes

LinkedIn_Crawler

Companies

UserCrawler

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages