## Scenario: Climate Data Analysis for a Research Center - Assignment 4
##### As a data scientist at a climate research center, you have been provided with daily temperature and humidity data collected from 500 locations over one year. Your objective is to analyze this data for trends, seasonal patterns, and other useful metrics that can aid in understanding climate changes across various regions.

### Task 1
### Initialize Temperature and Humidity Data
#### Set up two arrays to represent daily data:
- temperature_data: Randomly generated temperature values in Celsius, ranging between -10 and 40 degrees, for each of the 500 locations across 365 days.
- humidity_data: Randomly generated humidity percentages, ranging from 0 to 100, for each location and day.

In [76]:
import numpy as np
import random
# temperature_data = np.random.randint(-10, 40, (365, 500)) # >> temperature can be in float so float is better
temperature_data = np.random.uniform(-10, 40, (365, 500))
humidity_data = np.random.uniform(0, 100, (365, 500))
temperature_data, humidity_data

(array([[20.83478985, 33.83842057,  2.98101517, ..., 20.80049654,
         36.00315684, 29.9943652 ],
        [-9.66945125,  5.92263067,  4.38906749, ...,  3.72159265,
         17.88509903, 15.69139628],
        [ 6.01797258, 26.28363705, 27.19789664, ..., 20.0583516 ,
         14.24640607,  1.68031682],
        ...,
        [28.56633375,  0.73257391, 31.01722778, ..., 19.08567521,
         39.02286386, 14.38547036],
        [39.74018002, 37.05140114, -7.13816081, ..., 38.58365754,
         26.89818385, 20.64386002],
        [-9.70761037,  7.2510338 , 27.47539662, ..., 20.66578494,
         -5.31140708, 36.64059717]]),
 array([[35.26304199, 88.29529926, 47.80801153, ..., 69.45174066,
         96.88555688, 79.60674952],
        [32.36554911, 92.6815631 , 63.15239935, ..., 29.09039724,
         24.96192952, 99.50724009],
        [17.56070774, 83.15257074, 38.89301607, ..., 84.1740894 ,
         46.87624025,  4.99827396],
        ...,
        [ 0.17632581, 10.40268762, 43.24855299, ...,  

### Task 2 - Check for Missing Data
- Simulate missing data by randomly setting 5% of the values in temperature_data
and humidity_data to null values. Determine how many null values exist in each
array and report the total number of missing entries.

In [77]:
size = temperature_data.size
temperature_data = temperature_data.flatten()
humidity_data = humidity_data.flatten()
count = int(size * 0.05)
indices_temp = random.sample(range(size), count)
indices_hum = random.sample(range(size), count)
for ind in indices_temp:
    temperature_data[ind] = np.nan
for ind in indices_hum:
    humidity_data[ind] = np.nan
temperature_data =  np.reshape(temperature_data, (365, 500))
humidity_data = np.reshape(humidity_data, (365, 500))
# type(temperature_data)
c1 = np.isnan(temperature_data).sum()
c2 = np.isnan(humidity_data).sum()
print("count of null values for temperature_data = {}\ncount of null values for humidity_data = {}".format(c1, c2))
print("total numbers of missing entries: {}".format(c1+c2))

count of null values for temperature_data = 9125
count of null values for humidity_data = 9125
total numbers of missing entries: 18250


### Task 3 - Convert Temperature and Calculate Discomfort Index
Convert temperature_data from Celsius to Fahrenheit to facilitate data sharing with
international teams. Then, compute a "feels like" discomfort index by combining
temperature and humidity data.
- Ensure that any values in the "feels like" index that exceed 80 are capped at
80, meaning they should be set to 80 if they are originally greater than 80.

In [78]:
far = temperature_data * 9/5 + 32

discomfort_index = 0.5 * (far + 61.0 + # discomfort index formula taken from online sources for this assignment.. 
                          ((far - 68.0) * 1.2) + 
                          (humidity_data * 0.094))

# Cap "feels like" values at 80
discomfort_index = np.where(discomfort_index > 80, 80, discomfort_index)
discomfort_index


array([[67.81024687, 80.        , 33.04938659, ..., 69.34921497,
        80.        , 80.        ],
       [        nan, 40.98284219, 36.55851639, ...,         nan,
        61.48570677,         nan],
       [37.64093897, 80.        , 80.        , ..., 68.57171837,
        55.31106731, 28.46194618],
       ...,
       [80.        , 26.83942267,         nan, ..., 62.85654997,
        80.        , 56.08602819],
       [80.        , 80.        , 11.77199675, ..., 80.        ,
        80.        , 66.99801675],
       [10.13091206, 43.75651352,         nan, ..., 67.68255166,
        17.39781029, 80.        ]])

### Task 4 - Analyze January Temperatures
- Extract the daily temperatures for January (first 31 days). Calculate and display the
average January temperature across all 500 locations.

In [79]:
# first 31 days across all 500 locations..
jan_temperatures = temperature_data[:31, :]
avg_jan_temperature = np.nanmean(jan_temperatures)
print("Average January temperature across all locations:", avg_jan_temperature)


Average January temperature across all locations: 15.066354010383302


### Task 5 - Identify Extreme Temperatures
- Mark any temperature in temperature_data that exceeds 35°C as a potential error
by replacing it with a null value. Count the number of null values per location.

In [80]:
# 35 deg C as null
temperature_data = np.where(temperature_data > 35, np.nan, temperature_data)
null_count = np.isnan(temperature_data).sum(axis = 0)
print("Null values per location due to extreme temperatures:\n", null_count)


Null values per location due to extreme temperatures:
 [51 57 50 39 63 56 56 52 48 55 46 52 46 49 49 43 61 57 56 50 55 67 54 52
 41 59 50 57 41 51 57 63 47 73 49 56 52 46 57 45 53 49 51 54 59 62 44 54
 57 52 44 47 43 57 41 57 52 45 46 44 52 49 50 51 46 59 67 49 60 52 58 51
 58 37 49 63 45 46 43 50 42 73 40 53 48 49 54 66 49 52 46 57 45 40 53 62
 54 50 51 54 56 58 61 55 56 60 53 46 56 49 58 62 53 57 40 48 55 56 56 56
 50 66 58 58 47 52 58 61 45 51 55 51 57 63 53 52 57 51 62 67 64 56 53 47
 36 56 69 51 65 51 41 45 47 54 57 46 58 45 51 47 51 50 57 48 55 58 53 54
 40 53 62 67 52 68 64 51 65 50 53 49 44 52 54 52 55 54 50 48 54 61 57 53
 53 55 62 57 57 53 53 42 52 51 51 55 60 51 47 53 59 54 50 48 52 61 61 55
 54 48 53 50 58 55 54 50 54 44 45 50 52 54 46 49 53 45 54 60 55 40 60 59
 57 48 45 53 48 64 59 48 53 59 43 57 42 43 53 57 51 47 59 46 46 48 50 54
 68 57 53 57 39 56 50 49 46 49 51 59 50 53 57 59 45 57 54 56 44 48 51 47
 53 45 63 56 43 54 44 55 47 53 53 51 65 56 53 45 43 50 47 49 65 45 42

### Task 6 - Calculate Quarterly Temperature Averages
- Reshape temperature_data into four quarters (one per season) and calculate the
average temperature for each location across these quarters.

In [81]:
quarterly_temperature_data = np.array_split(temperature_data, 4, axis=0)

# average temperature for each location across quarters
quarterly_avg = [np.nanmean(quarter, axis=0) for quarter in quarterly_temperature_data]
print("Quarterly temperature averages for each location:\n", quarterly_avg)


Quarterly temperature averages for each location:
 [array([13.836125  , 12.11968299, 12.950361  , 13.49955892, 14.58904115,
       10.71638727,  8.95988206, 11.3419124 , 10.27950323, 15.09978647,
       13.06255398,  9.78692115, 13.22272764, 14.03878055, 12.92832868,
       13.50953196, 13.28793271, 13.52243037, 12.9017763 , 12.5316957 ,
       11.06374678, 12.95583942, 13.17671656, 12.0243066 , 12.78689421,
       11.64738247, 14.68133245, 12.08703699, 11.65860638, 11.26370831,
       14.40747974,  9.72855896, 12.48724543, 13.50333189, 14.19066373,
       12.5671939 , 12.09311233, 12.12051427, 11.0541566 , 13.72213081,
       12.48170882, 11.98544604, 12.52156554, 12.76855679, 10.79167889,
       13.28695493, 11.14038573, 13.51395507, 11.00379155, 12.58699417,
       10.83485768, 11.62277256, 12.89478038, 12.64574166, 10.21541879,
       12.44599174, 15.46803067, 11.51523105, 12.89879619, 13.7326646 ,
       13.77510338, 14.07205543, 13.6426673 , 12.44453182, 12.20905182,
       12.95

### Task 7 - Classify Humidity Levels
- Classify each day’s humidity level as "Dry" if below 30% and "Humid" if above 70%,
and count the total number of "Dry" and "Humid" days for each location.

In [82]:
# Classify humidity levels
humidity_classification = np.where(humidity_data < 30, "Dry", np.where(humidity_data > 70, "Humid", "Normal"))

dry_days = np.sum(humidity_classification == "Dry", axis=0)
humid_days = np.sum(humidity_classification == "Humid", axis=0)
print("Dry days per location:\n", dry_days)
print("Humid days per location:\n", humid_days)


Dry days per location:
 [109 109  90 108 106  95  89  90  94 110 115 100  88 107 101 108  85 101
 107 108  96 117  97 111 126 106  86 118 103  96 112  94  98 105 103  92
 124 108 118 105  94  96 104 128  98 111  96  96 111 101 109 110 108  95
  95 114 112 117  94 105 103 118 112  99 126 118 115 104 112 110 102 100
 109 106  96  94 116 104  92 121 114  88  98 114 108  98  99 115 111  91
 107 114  94  93 105 108 116 105 100 108  98  98  85  89  92 101 117  95
  97 119 103  83  92  96  98 102 109 115 116  78 109 101 102  95 107 109
 112 107 102 105 104 105  95 105 108 106  98 108 115  89  86 106  97  95
  97 103 115 115 106  93 111 104 108 107 116 107 113 112  84  92 105  97
  98 104 102 105 105 109 107 107 124 118 116 117 101  80  99 107 113 108
 101  97  97 101 109  99 110 104 110 100 113 108  94 104 105  93  92 106
  90 117 106 125  95 114 106 110  98 107 112 101 116 109 110 110 107 100
  91 108  95 101 110 105 105 106 114 126 104  95 111  96  97 123 105 108
 127 117 109 105 124 115  9

### Task 8 - Apply Daily Pressure Trend to Temperature Data
- Account for daily atmospheric pressure variations by generating a trend across the
365 days and applying it to adjust daily temperatures at each location.

In [83]:
pressure_trend = 1 + 0.1 * np.sin(np.linspace(0, 2 * np.pi, 365))

# Adjust temperature data with the daily pressure trend
adjusted_data = temperature_data * pressure_trend[:, np.newaxis]
print("Adjusted temperature data based on pressure trend:\n", adjusted_data)


Adjusted temperature data based on pressure trend:
 [[20.83478985 33.83842057  2.98101517 ... 20.80049654         nan
  29.9943652 ]
 [-9.68614135  5.93285351  4.3966433  ...         nan 17.91596986
          nan]
 [ 6.0387443  26.37435802 27.29177328 ... 20.12758528 14.29557916
   1.68611662]
 ...
 [28.46773379  0.73004535 30.91016829 ... 19.01979883         nan
  14.33581727]
 [        nan         nan -7.12583989 ...         nan 26.85175586
  20.60822739]
 [-9.70761037  7.2510338          nan ... 20.66578494 -5.31140708
          nan]]
