# Organizing Companies by Industry - Jupyter Notebook
This notebook documents the process of organizing companies scraped from LinkedIn by their **industry** into separate Xlsx files.

Since the retrieved data is scraped from the search results, there's already no duplicates hence there's no need for advanced data manipulation techniques which will make the task more simple and less time consuming.  

The purpose of this notebook is to demonstrate how to efficiently manage and organize large datasets, particularly when dealing with **categorical data** such as **industry types**.


# First Step:
To proceed, I begun with the setup which consists of importing the necessary Libraries. In this case, In opted for the ***Pandas*** library.

In [1]:
import pandas as pd

# Second Step:
In the next step, the program needs to **read** the scraped data already stored in the file "linkedin.csv" and load it into a data frame.

---



In [None]:
scraped_data = pd.read_csv('linkedin.csv')

# Third Step:
The third step is to organize the data in way that it's grouped by Industry of each company.

In [None]:
industry_groups = scraped_data.groupby('Industry')


# Fourth Step:
This step involves organizing the data and saving it by the Industry name to be more convinient for the user and to avoid time consumption while searching for the desired data. The data is saved in Xlsx file.

In [None]:
for industry, group in scraped_data.groupby('Industry'):
    filename = f'{industry}_companies.xlsx'
    group.to_excel(filename, index=False)


## Conclusion

In this notebook, we've successfully organized companies scraped from LinkedIn by their industry into separate Xlsx files.

This process allows for better management and analysis of the data, making it easier to perform tasks such as industry-specific analysis or data visualization.



For a better reference, here's the whole code for cleaning the data.

In [None]:
import pandas as pd

# Read the CSV file generated by your scraping script
scraped_data = pd.read_csv('linkedin.csv')

# Organize companies with the same industry into one separate file
for industry, group in scraped_data.groupby('Industry'):
    filename = f'{industry}_companies.xlsx'
    group.to_excel(filename, index=False)
