# Explore Laptop Repair Data

- [View Solution Notebook](./solution.html)
- [View Project Page](https://www.codecademy.com/projects/practice/explore-laptop-repair-data-with-python)

## Task Group 1 -  Import and Inspect

### Task 1 

Import `pandas` using the alias `pd`.

In [1]:
import pandas as pd

### Task 2

Import the dataset in `laptops.csv` and assign to the variable `laptops`.

In [2]:
laptops = pd.read_csv('laptops.csv')

### Task 3

Display the first five lines of the `laptops` DataFrame.

In [3]:
laptops.head()

Unnamed: 0,Fault type,Fixed,Repairable,End of life,Unknown,Repair attempts,% of total,Product Age,Count
0,Performance,66%,28%,5%,2%,398,15.70%,0,15
1,Power/battery,35%,43%,21%,1%,316,12.50%,1,26
2,Configuration,66%,25%,5%,4%,295,11.70%,2,72
3,Integrated screen,31%,53%,12%,5%,257,10.20%,3,68
4,Internal storage,65%,24%,8%,3%,184,7.30%,4,103


### Task 4

Display the data types of the `laptops` DataFrame columns.

In [4]:
laptops.dtypes

Fault type         object
Fixed              object
Repairable         object
End of life        object
Unknown            object
Repair attempts     int64
% of total         object
Product Age         int64
Count               int64
dtype: object

## Task Group 2 -  Explore Numeric Columns

### Task 5

Let's take a look at the ages of laptops being brought in for repair. Use a series method to display the minimum, maximum, and other summary statistics for the `age` column. Do you notice anything interesting?

In [6]:
laptops['Product Age'].describe()

count    19.000000
mean      9.105263
std       5.801290
min       0.000000
25%       4.500000
50%       9.000000
75%      13.500000
max      19.000000
Name: Product Age, dtype: float64

### Task 6

The other numeric column is `event_year`. Use a pandas method to determine the earliest and latest years in the dataset.

In [6]:
laptops['event_year'].describe()

count    3511.000000
mean     2017.583594
std         1.691374
min      2012.000000
25%      2017.000000
50%      2018.000000
75%      2019.000000
max      2020.000000
Name: event_year, dtype: float64

## Task Group 3 - Explore Categorical Columns

### Task 7

Let's also take a look at `event_year` as a categorical column. Use a series method to output the number of laptops in the data for each `event_year`. What do you notice?

In [7]:
laptops['event_year'].value_counts()

KeyError: 'event_year'

### Task 8

Now, let's focus on the problems causing people to bring in laptops for repair. Use a series method to display the most common problems in the dataset. What do you notice?

In [8]:
laptops['problem'].value_counts()

unknown                       980
performance                   398
power/battery                 316
configuration                 295
integrated screen             257
internal storage              184
operating system              177
boot                          171
ports/slots/connectors        127
overheating                   107
integrated keyboard           106
case/chassis                   75
system board                   72
internal damage                59
virus/malware                  57
other                          32
integrated media component     27
integrated pointing device     25
multiple                       23
integrated optical drive       23
Name: problem, dtype: int64

### Task 9

Power and battery issues are pretty common, but what percentage of the data do they represent? Modify the method from the previous task to output percentages instead of counts.

In [9]:
laptops['problem'].value_counts(normalize=True) * 100

unknown                       27.912276
performance                   11.335802
power/battery                  9.000285
configuration                  8.402165
integrated screen              7.319852
internal storage               5.240672
operating system               5.041299
boot                           4.870407
ports/slots/connectors         3.617203
overheating                    3.047565
integrated keyboard            3.019083
case/chassis                   2.136144
system board                   2.050698
internal damage                1.680433
virus/malware                  1.623469
other                          0.911421
integrated media component     0.769012
integrated pointing device     0.712048
multiple                       0.655084
integrated optical drive       0.655084
Name: problem, dtype: float64

### Task 10

Lastly, let's look at how often laptops brought into these events are recorded as fixed. Use a pandas method to count the number of laptops in each category of `repair_status`.

In [10]:
laptops['repair_status'].value_counts()

fixed          1835
repairable     1146
end of life     423
unknown         107
Name: repair_status, dtype: int64