# US009 - Water comsumption cost

## Introduction

The User Story 009 involves analyzing water consumption data on a daily and monthly basis, including associated costs and other relevant insights. The input for this analysis will be provided through a .csv file containing essential information such as name, year, month, day, and consumption. These data will facilitate improvements in park maintenance, consumption control, and cost management.

## Monthly consumption
 
Input: 

1. .csv file
2. park name
3. year
4. start month
5. end month

Output: 

1. bar chart

With this data, the program generates a bar chart depicting monthly consumption.

## Average Costs (Monthly)

Input:

1. .csv file containing consumption data
2. Number of parks to be analyzed
3. Respective park names

Output:

1. Average monthly costs for each chosen park

Given:

1. Monthly consumption ($c$)
2. Price: €0.7 per $m^3$
3. Surcharge: 15% on excess consumption exceeding 1000 $m^3$

## Parks with Lowest and Highest Daily Consumption

Input: 

1. .csv file

Output: 

1. mean 
2. median
3. standard deviation
4. skewness coefficient
5. relative and absolute frequency tables (5 classes)
6. outlier
7. histograms (10 and 100 classes) 

of daily consumption for parks with lowest and highest consumption

### Mean

The mean helps understand the range around which the $n$ elements of a particular set will revolve.

Let $\bar{x}$ be the mean and the set of $n$ values be: $x_1$, $x_2$, ..., $x_i$ :

$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i$

Automatically calculated in the program using the .mean() function.

### Median

When all ${n}$ values are sorted, the median, if $n$ is odd, corresponds to the "central" value, and if even, it is the average of the two "central" values.

$\tilde{x} = \begin{cases}
\frac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2}+1\right)}}{2}, & \text{if n even}\
x\smash[b]{\left(\frac{n}{2}\right)}, & \text{if n odd}
\end{cases}
$

Automatically calculated in the program using the .median() function.

### Standard Deviation

The standard deviation helps understand how much the data can vary from the mean ($\mu$); for instance, a higher standard deviation implies a higher likelihood of values deviating more from the expected/mean value ($\mu$).

The standard deviation is the square root of the variance, thus:

$\sigma = \sigma_x = \sqrt{E[(X - \mu)^2]}$

Automatically calculated in the program using the .std() function.

### Skewness Coefficient

Given $m_3$ as the third moment about the mean and $s$ as the experimental standard deviation, the skewness coefficient is calculated as follows:

$a_3 = \frac{m_3}{s^3}$

where

$m_3 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^3$

If the distribution is symmetric, i.e., $a_3 = 0$, we can assume that the mean, median, and mode coincide. If it's skewed, we can determine whether the distribution is skewed to the right ($a_3 > 0$) or left ($a_3 < 0$). This is visually apparent by the presence of more occurrences to the left or right, respectively.

Automatically calculated in the program using the .skew() function.

### Absolute and Relative Frequency Tables

Absolute (quantity) and relative (%) frequency tables group the quantity and percentage of occurrences within certain intervals. In this case, 5 classes were requested, so 5 intervals will be presented, encompassing all values of the analyzed set.

The number of occurrences $(e)$ is an integer representing the quantity of elements from the set that belong to the respective interval, with $(n)$ being the quantity of elements of the entire set. The percentage of occurrences $(o)$ is calculated as follows:

$ o = \frac{e}{n}$

These tables help the user understand the range within which a certain part of elements is concentrated.

The program constructs these tables using tabulate.
Outliers

Given the Interquartile Range $r_q$:

$r_q = q_3 - q_1$

where $q_1$ and $q_3$ are, respectively, the first and third quartiles. Given the median ($\tilde{x}$), outliers $o$ are all those that do not belong to the interval [$i_e, i_d$]:

$i_d = \tilde{x} + 1.5\times r_q$

$i_e = \tilde{x} - 1.5\times r_q$

Outliers serve to detect the existence of values that significantly deviate from the median.

The first and third quartiles were calculated in the program, respectively, using the .quantile(0.25) and .quantile(0.75) functions.
Histograms (10 and 100 classes)

Representation of an absolute frequency table with 10 and 100 intervals in a Histogram, similar to a bar chart but where the bar width represents a range of values on the x-axis, not a single value as in a bar chart.

Histograms are constructed using matplotlib.pyplot.




# US010 - Pie chart of the equipment used in each day

## Introduction

The User Storie 010 aims to identify the park's most frequently used equipment on a daily basis to understand the preferences of park users. This user story receives a file with individuals' choices and then outputs a pie chart containing the percentages of equipment usage.

## Percentage of use of park equipment

Input:

1. .csv file containing park users' equipment choices
2. Name of the equipment

Output:

1. Pie chart displaying the percentage of usage for each park equipment


# US011 - Collect data from the user portal

## Introduction

The User Story 011 aims to gather user portal data on park usage to understand park usage across different age groups. This us receives a .csv file as input with three variables: the age range of park users, whether these individuals recommend the park, and how often they use the park per month. The output is the proportion of people who recommend the park in each age group and a boxplot depicting the frequency of visits by age group.
Proportion of users recommending the park by age group

Input: 

1. .csv file
2. age range
3. recommendations

Output: 

1. Numeric value

With this data, the program counts recommendations by age group. After grouping ages with recommendations (via the groupby function - Y/N), the program calculates the proportion of users recommending the park by age group.

## Frequency of visits by age group

Input: 

1. .csv file
2. age range 
3. number of visits per month

Output: 

1. Boxplot

With this data, the program creates a boxplot, a diagram depicting extremes and quartiles (via matplotlib.pyplot) with the frequency of monthly visits by age group. The boxplot graphically represents the values: 

* minimum monthly visits
* maximum monthly visits
* median monthly visits by age group

## Research Variables:

### Age Group: 
1. Child (up to 15 years old)
2. Adult (between 16 and 65 years old)
3. Senior (over 65 years old)

### Park Recommendation: 
1. Y (Yes)
2. N (No)

### Monthly Park Visit Frequency: 

Numeric variable representing the number of visits to the park per month.

## Data Analysis:

### Proportion of Users by Age Group who would Recommend the Park:

Calculate the proportion of users in each age group who responded "Y" (Yes) to the park recommendation question.

### Boxplot for Each Age Group:
* Create a boxplot for each age group, representing the distribution of monthly park visit frequency.
* Observe the median, quartiles, and possible outliers for each age group to draw conclusions about park usage patterns.

These analyses are essential for understanding how different age groups utilize the park, as well as their willingness to recommend it to other users. The boxplot provides a clear visualization of park visit frequency trends for each age group.


# Self-assessment

* 1230481 - Ricardo Morim - 28%
* 1231151 - Marisa Afonso - 18%
* 1231170 - Gonçalo Fernandes - 18%
* 1230929 - Ana Filipa Alves - 18%
* 1221018 - Afonso Marques - 18%
