Skip to content

API pipeline to extract business data at scale including general info, employee data, rankings, posts and feed data, social media presence index to investigate alternative causal relationships such as comparative asset analysis

Notifications You must be signed in to change notification settings

DarienNouri/LinkedIn-API

Repository files navigation

LinkedIn Crawler

LinkedIn Crawler is a Python package that provides an easy way to extract data from LinkedIn profiles and company pages. It uses Selenium WebDriver and BeautifulSoup for web scraping and parsing the HTML content.

Installation

  1. Install the required packages:
pip install -r requirements.txt
  1. Create a .env file in the project directory and add your LinkedIn credentials:
LINKEDIN_USERNAME=your_email@example.com
LINKEDIN_PASSWORD=your_password

Usage

LinkedIn_Crawler

The LinkedIn_Crawler class is the base class for crawling LinkedIn profiles and company pages.

from Crawler import LinkedIn_Crawler

crawler = LinkedIn_Crawler()
crawler.start_driver(headless=True)
crawler.login()

Companies

The Companies class inherits from LinkedIn_Crawler and provides methods for extracting company information.

from Companies import Companies

company_crawler = Companies()
company_data = company_crawler.get_all_data_by_company('https://www.linkedin.com/company/example/')

UserCrawler

The UserCrawler class inherits from LinkedIn_Crawler and provides methods for extracting user profile information.

from UserCrawler import UserCrawler

user_crawler = UserCrawler()
user_data = user_crawler.get_all_user_data('https://www.linkedin.com/in/example/', crawler.driver)

Classes

LinkedIn_Crawler

  • __init__(self, driver=None): Initializes the LinkedIn_Crawler object.
  • start_driver(self, headless=True): Starts the WebDriver with the specified options.
  • login(self): Logs in to LinkedIn using the credentials from the .env file.
  • handle_security_verification(self): Handles security verification if prompted.
  • get_soup(self): Returns a BeautifulSoup object of the current page source.
  • is_company(self, input_link): Checks if the input link is a company page.
  • create_link(self, input): Creates a LinkedIn company page link from the input.
  • date_utc(self): Returns the current date in UTC format.
  • load_posts_count(self, load=10): Loads the specified number of posts on a company page.
  • set_posts_filter(self): Sets the posts filter to "Recent".
  • crawl_posts(self): Crawls the posts on a company page.
  • select_posts(self): Selects the "Posts" tab on a company page.
  • load_posts(self): Loads the posts on a company page.
  • extract_data(self, container): Extracts data from a post container.
  • posts_crawler_process(self, posts_link): Crawls the posts on a company page using the specified link.

Companies

  • company_basics(self): Extracts basic company information.
  • get_all_data_by_company(self, input_link): Extracts all company data using the specified link.

UserCrawler

  • user_basic_info(self): Extracts basic user information.
  • user_edu_exp(self): Extracts user education and experience information.
  • get_user_experiences(self): Extracts user experiences.
  • get_all_user_data(self, input_link, driver): Extracts all user data using the specified link and WebDriver.

License

This project is licensed under the MIT License.

About

API pipeline to extract business data at scale including general info, employee data, rankings, posts and feed data, social media presence index to investigate alternative causal relationships such as comparative asset analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages