Goal: While working on a data analysis project, there was a need to understand and analyze the relationships between different entities within an organization—specifically between projects, employees, and their seniority levels. Built a structured and analyzable data model using Python to connect and analyze these datasets effectively, allowing for insights into project staffing, seniority distributions, and performance trends.
Descriptions: Created three distinct DataFrames—Project, Employee, and Seniority Level—using Pandas in Python. Applied advanced data wrangling techniques including:
- Merging datasets to form a relational view
- Filtering data based on roles, departments, and experience
- Aggregating metrics such as average seniority by project or employee count per project
Used NumPy for efficient numerical operations and data transformation where needed. Ensured data consistency and optimized memory usage for scalability.
Result: Developed a clean, relational dataset that enabled efficient analysis across multiple dimensions (project-level, employee-level, and seniority-level). This structure allowed for faster reporting, easier visualization, and more informed decision-making in resource allocation and project planning.
Code: http://localhost:8888/notebooks/Documents/Manjiri%20Study/PythonCapstone.ipynb?
Technology: Python, Pandas, NumPy, Jupyter Notebook
Skills: Data wrangling and preprocessing, Merging and joining DataFrames, Filtering and conditional selection, Grouping and aggregation, Data modeling for multi-table relationship analysis