# Explore Laptop Repair Data

- [View Solution Notebook](./solution.html)
- [View Project Page](https://www.codecademy.com/projects/practice/explore-laptop-repair-data-with-python)

## Task Group 1 -  Import and Inspect

### Task 1 

Import `pandas` using the alias `pd`.

In [1]:
import pandas as pd

### Task 2

Import the dataset in `laptops.csv` and assign to the variable `laptops`.

In [2]:
laptops = pd.read_csv('laptops.csv')

### Task 3

Display the first five lines of the `laptops` DataFrame.

In [3]:
laptops.head()

Unnamed: 0,problem,brand,repair_status,event_year,age
0,unknown,lenovo,fixed,2019,0.0
1,virus/malware,lenovo,fixed,2018,0.0
2,operating system,geo,fixed,2019,0.0
3,multiple,hp,fixed,2019,0.0
4,virus/malware,hp,fixed,2018,0.0


### Task 4

Display the data types of the `laptops` DataFrame columns.

In [4]:
laptops.dtypes

problem           object
brand             object
repair_status     object
event_year         int64
age              float64
dtype: object

## Task Group 2 -  Explore Numeric Columns

### Task 5

Let's take a look at the ages of laptops being brought in for repair. Use a series method to display the minimum, maximum, and other summary statistics for the `age` column. Do you notice anything interesting?

In [5]:
laptops.describe()

Unnamed: 0,event_year,age
count,3511.0,815.0
mean,2017.583594,6.136196
std,1.691374,3.600758
min,2012.0,0.0
25%,2017.0,4.0
50%,2018.0,5.0
75%,2019.0,8.0
max,2020.0,29.0


### Task 6

The other numeric column is `event_year`. Use a pandas method to determine the earliest and latest years in the dataset.

In [6]:
earliest_year = laptops['event_year'].min()
latest_year = laptops['event_year'].max()

print("Earliest year:", earliest_year)
print("Latest year:", latest_year)

Earliest year: 2012
Latest year: 2020


## Task Group 3 - Explore Categorical Columns

### Task 7

Let's also take a look at `event_year` as a categorical column. Use a series method to output the number of laptops in the data for each `event_year`. What do you notice?

In [7]:
event_year_counts = laptops['event_year'].value_counts()

event_year_counts

2019    1116
2018     690
2017     686
2016     362
2020     231
2014     177
2015     162
2013      86
2012       1
Name: event_year, dtype: int64

### Task 8

Now, let's focus on the problems causing people to bring in laptops for repair. Use a series method to display the most common problems in the dataset. What do you notice?

In [8]:
problem_counts = laptops['problem'].value_counts()

problem_counts

unknown                       980
performance                   398
power/battery                 316
configuration                 295
integrated screen             257
internal storage              184
operating system              177
boot                          171
ports/slots/connectors        127
overheating                   107
integrated keyboard           106
case/chassis                   75
system board                   72
internal damage                59
virus/malware                  57
other                          32
integrated media component     27
integrated pointing device     25
multiple                       23
integrated optical drive       23
Name: problem, dtype: int64

### Task 9

Power and battery issues are pretty common, but what percentage of the data do they represent? Modify the method from the previous task to output percentages instead of counts.

In [9]:
problem_percentages = laptops['problem'].value_counts(normalize=True) * 100

problem_percentages

unknown                       27.912276
performance                   11.335802
power/battery                  9.000285
configuration                  8.402165
integrated screen              7.319852
internal storage               5.240672
operating system               5.041299
boot                           4.870407
ports/slots/connectors         3.617203
overheating                    3.047565
integrated keyboard            3.019083
case/chassis                   2.136144
system board                   2.050698
internal damage                1.680433
virus/malware                  1.623469
other                          0.911421
integrated media component     0.769012
integrated pointing device     0.712048
multiple                       0.655084
integrated optical drive       0.655084
Name: problem, dtype: float64

### Task 10

Lastly, let's look at how often laptops brought into these events are recorded as fixed. Use a pandas method to count the number of laptops in each category of `repair_status`.

In [10]:
repair_status_counts = laptops['repair_status'].value_counts()

repair_status_counts

fixed          1835
repairable     1146
end of life     423
unknown         107
Name: repair_status, dtype: int64