As a new data engineer, the challenge was a research project on Pewlett Hackard employees from the 1980s and 1990s using SQL: data modeling, data engineering, and data anlysis. The challenge utlizized data modeling, engineering, and analysis to build a SQL database with six tables. Using employee data from historical csv files, the database and tables were sucessfully imported.
- Inspect CSV files
- Utilize QuickDatabase to create an ERD of the schema/tabses
- Export foundational Postgresql query from QuickDBD
- Modify postgresql query to best align data types, primary keys, foreign keys, and other query contraints. -- Identified composite keys, if primary key was not unique to table. Review table: dept_emp, dept_manager
- Tables were created in the correct order to ensure table has correct foreign key
- Import CSV files corresponding with how postgresql tables were created
- List the following details of each employee: employee number, last name, first name, sex, and salary
- List first name, last name, and hire date for employees who were hired in 1986
- List the manager of each department with the following information: department number, department name, the manager's employee number, last name, first name
- List the department of each employee with the following information: employee number, last name, first name, and department name
- List first name, last name, and sex for employees whose first name is "Hercules" and last names begin with "B"
- List all employees in the Sales department, including their employee number, last name, first name, and department name
- List all employees in the Sales and Development departments, including their employee number, last name, first name, and department name
- In descending order, list the frequency count of employee last names, i.e., how many employees share each last name
As you examine the data, you are overcome with a creeping suspicion that the dataset is fake. You surmise that your boss handed you spurious data in order to test the data engineering skills of a new employee. To confirm your hunch, you decide to take the following steps to generate a visualization of the data, with which you will confront your boss:
-
Import SQL database into Pandas
-
Create a histogram to visualize the most common salary ranges for employees.
- Create a bar chart of average salary by title.