# Regiment

### Introduction:

Special thanks to: http://chrisalbon.com/ for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [61]:
import pandas as pd

### Step 2. Create the DataFrame with the following values:

In [62]:
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
        'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
        'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

### Step 3. Assign it to a variable called regiment.
#### Don't forget to name each column

Naming columns improves clarity throughtout the data analysis process.



In [63]:
column_names = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore']


In [64]:
regiment = pd.DataFrame(raw_data, columns=column_names)
regiment

Unnamed: 0,regiment,company,name,preTestScore,postTestScore
0,Nighthawks,1st,Miller,4,25
1,Nighthawks,1st,Jacobson,24,94
2,Nighthawks,2nd,Ali,31,57
3,Nighthawks,2nd,Milner,2,62
4,Dragoons,1st,Cooze,3,70
5,Dragoons,1st,Jacon,4,25
6,Dragoons,2nd,Ryaner,24,94
7,Dragoons,2nd,Sone,31,57
8,Scouts,1st,Sloan,2,62
9,Scouts,1st,Piger,3,70


### Step 4. What is the mean preTestScore from the regiment Nighthawks?  


To present general statistics by company from the regiment DataFrame, we calculate various statistics such as mean, median, standard deviation, minimum, maximum, etc., for the numeric columns ('preTestScore' and 'postTestScore') grouped by each company ('1st' and '2nd'). 

In [65]:
mean_regiment = regiment[regiment['regiment'] == 'Nighthawks']['preTestScore'].mean()

In [66]:
print("The mean of Nighthawks is ", mean_regiment)

The mean of Nighthawks is  15.25


### Step 5. Present general statistics by company

In [67]:
general_statistics = regiment.groupby('company').agg({
    'preTestScore':['mean', 'median', 'std', 'min', 'max'],
    'postTestScore': ['mean', 'median', 'std', 'min', 'max']
}).reset_index()

- .groupby('company'): groups the DataFrame regiment by the 'company' column
- .agg(): This method aggregates the grouped data. Inside .agg(), we specify the statistics we want to calculate for each numeric column ('preTestScore' and 'postTestScore'). Here, 'mean', 'median', 'std', 'min', and 'max' are aggregation functions provided as a list.
- reset_index(): After performing the aggregation, reset_index() is used to reset the index of the resulting DataFrame so that 'company' becomes a regular column instead of an index.

In [68]:
general_statistics

Unnamed: 0_level_0,company,preTestScore,preTestScore,preTestScore,preTestScore,preTestScore,postTestScore,postTestScore,postTestScore,postTestScore,postTestScore
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,median,std,min,max,mean,median,std,min,max
0,1st,6.666667,3.5,8.524475,2,24,57.666667,66.0,27.485754,25,94
1,2nd,15.5,13.5,14.652645,2,31,67.0,62.0,14.057027,57,94


### Step 6. What is the mean of each company's preTestScore?

In [69]:
mean_companies = regiment.groupby('company')['preTestScore'].mean()

In [70]:
mean_companies

company
1st     6.666667
2nd    15.500000
Name: preTestScore, dtype: float64

- groupby('company'): This groups the regiment DataFrame by the 'company' column. After this operation, we have separate groups of rows where each group corresponds to a unique company ('1st' and '2nd' in this case).

- ['preTestScore']: This specifies that we are interested in the 'preTestScore' column for the subsequent calculations. It selects the 'preTestScore' column from the grouped data.

- .mean(): After selecting the 'preTestScore' column for each group (company), .mean() calculates the mean (average) value of 'preTestScore' for each group separately.

##### Result:
The result mean_companies will be a pandas Series where:

- The index represents the unique values of 'company' ('1st' and '2nd' in this case).
- The values represent the mean preTestScore for each company.

### Step 7. Present the mean preTestScores grouped by regiment and company

In [82]:
mean_pretestscores = regiment.groupby(['regiment', 'company'])['preTestScore'].mean().reset_index()

In [83]:
mean_pretestscores

Unnamed: 0,regiment,company,preTestScore
0,Dragoons,1st,3.5
1,Dragoons,2nd,27.5
2,Nighthawks,1st,14.0
3,Nighthawks,2nd,16.5
4,Scouts,1st,2.5
5,Scouts,2nd,2.5


### Step 8. Present the mean preTestScores grouped by regiment and company without heirarchical indexing

Use groupby with reset index :
as_index=False parameter ensures that the group labels are not set as the index, thus keeping the DataFrame flat

In [80]:
mean_pretestscores =regiment.groupby(['regiment', "company"], as_index=False)['preTestScore'].mean()

In [81]:
mean_pretestscores

Unnamed: 0,regiment,company,preTestScore
0,Dragoons,1st,3.5
1,Dragoons,2nd,27.5
2,Nighthawks,1st,14.0
3,Nighthawks,2nd,16.5
4,Scouts,1st,2.5
5,Scouts,2nd,2.5


### Step 9. Group the entire dataframe by regiment and company

In [88]:
grouped = regiment.groupby(['regiment', 'company'])


In [91]:
print(grouped)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000014EE8C11DF0>


The output <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000014EE8C11DF0> indicates that the DataFrame has been successfully grouped, resulting in a DataFrameGroupBy object. This object represents the grouped data but doesn't display the grouped data directly. Instead, we need to perform an operation on this grouped object to see meaningful results.