# Data Manipulation

## Load the airquality data set regarding a set of New York Air Quality Measurements.

In [1]:
df <- airquality

head(df)


Unnamed: 0_level_0,Ozone,Solar.R,Wind,Temp,Month,Day
Unnamed: 0_level_1,<int>,<int>,<dbl>,<int>,<int>,<int>
1,41.0,190.0,7.4,67,5,1
2,36.0,118.0,8.0,72,5,2
3,12.0,149.0,12.6,74,5,3
4,18.0,313.0,11.5,62,5,4
5,,,14.3,56,5,5
6,28.0,,14.9,66,5,6


### For which attributes are there missing values?

In [3]:
colSums(is.na(df))

Faltan Valores para Ozone y Solar.R

### Do all the attributes are in the most suitable data type? Make the changes you find necessary.

In [5]:
str(df)

'data.frame':	153 obs. of  6 variables:
 $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
 $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
 $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
 $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...


Todos los atributos estan en el tipo de datos adecuado

### What period of the year do these records refer to?


In [8]:
min_month <- min(df$Month)
max_month <- max(df$Month)

paste("Período:", min_month, "-", max_month)



### Load the package dplyr and save the data set in a table data frame format.

In [11]:
#install.packages("dplyr")
library(dplyr)


In [13]:
df <- as_tibble(df)

### Select the days in May with a temperature above 70 Fahrenheit.

In [15]:
may_above_70 <- df %>%
  filter(Month == 5, Temp > 70)

days <- may_above_70$Day
print(days)

[1]  2  3 11 22 29 30 31


### Create a new attribute TempC which represents the temperature values in Celsius.

In [16]:
df <- df %>%
  mutate(TempC = (Temp - 32) * (5/9))

# View the updated dataframe
print(df)

[90m# A tibble: 153 × 7[39m
   Ozone Solar.R  Wind  Temp Month   Day TempC
   [3m[90m<int>[39m[23m   [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m
[90m 1[39m    41     190   7.4    67     5     1  19.4
[90m 2[39m    36     118   8      72     5     2  22.2
[90m 3[39m    12     149  12.6    74     5     3  23.3
[90m 4[39m    18     313  11.5    62     5     4  16.7
[90m 5[39m    [31mNA[39m      [31mNA[39m  14.3    56     5     5  13.3
[90m 6[39m    28      [31mNA[39m  14.9    66     5     6  18.9
[90m 7[39m    23     299   8.6    65     5     7  18.3
[90m 8[39m    19      99  13.8    59     5     8  15  
[90m 9[39m     8      19  20.1    61     5     9  16.1
[90m10[39m    [31mNA[39m     194   8.6    69     5    10  20.6
[90m# ℹ 143 more rows[39m


### Inspect which were the 30 hottest days

In [20]:
hottest_30_days <- df %>%
  arrange(desc(Temp)) %>%
  head(30)

hottest_30_days <- hottest_30_days$Day

print(hottest_30_days)

 [1] 28 30 29 31 11  3  4 12  8  9 10  2 14  1  9  8  9 10  7  7 28 27  8 10 19
[26]  6  5 24 27 29


### Inspect which were the hottest days, but also with the highest ozone values.

In [22]:
hottest_days_with_highest_ozone <- df %>%
  arrange(desc(Temp), desc(Ozone)) %>%
  head()

hottest_days_with_highest_ozone <- hottest_days_with_highest_ozone$Day

print(hottest_days_with_highest_ozone)

[1] 28 30 29 31  4  3


### Inspect the number of days for which there was a register for each month.


In [23]:
month_counts <- table(df$Month)

print(month_counts)


 5  6  7  8  9 
31 30 31 31 30 


### For each month, obtain the minimum and the maximum temperature registered in Celsius.

In [24]:
temperature_summary <- aggregate(TempC ~ Month, data = df, FUN = function(x) c(min = min(x), max = max(x)))

print(temperature_summary)

  Month TempC.min TempC.max
1     5  13.33333  27.22222
2     6  18.33333  33.88889
3     7  22.77778  33.33333
4     8  22.22222  36.11111
5     9  17.22222  33.88889


### Obtain the average of the following parameters by month: temperature in celsius, wind, solar radiation and ozone.


In [25]:
average_parameters <- aggregate(cbind(TempC, Wind, Solar.R, Ozone) ~ Month, data = df, FUN = mean)

print(average_parameters)

  Month    TempC      Wind  Solar.R    Ozone
1     5 19.14352 11.504167 182.0417 24.12500
2     6 25.67901 12.177778 184.2222 29.44444
3     7 28.82479  8.523077 216.4231 59.11538
4     8 28.71981  8.860870 173.0870 60.00000
5     9 24.94253 10.075862 168.2069 31.44828


### What values did you obtain regarding ozone and solar radiation attributes? Why? Make the necessary change so that you get the average of the registered values.

Eliminamos los datos nulos de las columnas.

In [26]:
df <- na.omit(df)

In [28]:
average_parameters <- aggregate(cbind(Ozone, Solar.R) ~ Month, data = df, FUN = mean)

print(average_parameters)

  Month    Ozone  Solar.R
1     5 24.12500 182.0417
2     6 29.44444 184.2222
3     7 59.11538 216.4231
4     8 60.00000 173.0870
5     9 31.44828 168.2069
