Simple Analysis of a car sales dataset, with specific actions performed to clean the data in Excel, analysis with formulas, pivot table, charts and spreadsheet visualization
saved from .csv to .txt file (sometimes my Excel does NOT convert directly a .csv file), upload in excel, converted to .xlsx file.
- some alignment, spell checking in Manufacturer and Model (one correction for Cadillac Escalade in car models),
- delete both two Saab models as NO known models, and impossible to understand what models could be referred to,
- format as number column C,G,H,I,J,K,L,M,N and P,
- format with no decimal value (with ,) column C,G,H,I,J,K, M,N and P,
- transform all text in column E in Passenger,
- delete row Town&Country Model of Crysler as too many value missing,
- for column C, Sales in Thousands: manual correcting some values with point, cutting the dor (.) and adding a zero, this looks reasonable, then renamed the column as Unit Sales
- for column D, Price in Thousands: for all values where there is 1 digit only after the dot (.) added 2 zero; for all values where there are 2 digits only after the dot (.) added 1 zero; for all values where there are 2 digits only: added the dot(.) plus 3 zeros; all manual, all very boring, but necessary for the analysis; for price missing in row 4 (Acura CL) deleted the row; at the end renamed the column with Price and formatted as Currency USD
- for column D, Year Resale Value: for all values where there is 1 digit only after the dot (.) added 2 zero; for all values where there are 2 digits only after the dot (.) added 1 zero; for all values where there are 2 digits only: added the dot(.) plus 3 zeros;
- after that changing position of column D to new column E and of column F in new column D, this for better reading
- for column L (Curb Weight): for all values where there is 1 digit only after the dot (.) added 2 zero; for all values where there are 2 digits only after the dot (.) added 1 zero; for all values where there are 2 digits only: added the dot(.) plus 3 zeros; google search (Wikipedia) for missing value for Cadillac Seville, answer is 3900, inserted this value in column;
- at the end pls see the original sheet and the cleaned data sheet
first analysis with new column and use of formulas IFS, COUNTIF, SUMIFS and VLOOKUP
1 table and 7 different pivot tables with chart visualizations
2 spreadsheets with several charts, first spreadsheet about total unit sales and total value sales per manufacturers, second spreadsheet with analysis about price and year retention value