## Data analysis in Python using Pandas 

Panadas is a Python library for data analysis. It offers a number of data exploration, cleaning and transformation operations that are critical in working with data in Python.

In [None]:
!pip install pandas

In [None]:
import pandas as pd

### Happiness data

This Dataset is based on a survey conducted where people rated different metrics of their city on a scale of 5 and answered if they are happy or unhappy.

The goal of this dataset is to understand the important factors that play a role in making the residents of a city more happy with their lives.

Data Dictionary:-

infoavail = the availability of information about the city services
housecost = the cost of housing
schoolquality = the overall quality of public schools
policetrust = your trust in the local police
streetquality = the maintenance of streets and sidewalks
events = the availability of social community events
happy = decision attribute (D) with values 0 (unhappy) and 1 (happy)

### Pandas data loading

Pandas usually loads a dataset into a DataFrame object. A DataFrame is a two-dimensional table of data. Pandas provides a number of ways to load data into a DataFrame, one of which is loading from a CSV file.

In [None]:
df = pd.read_csv('happydata.csv')

we can use the head() method to display the first five rows of the DataFrame, and the tail() method to display the last five rows.

In [None]:
print(df.head())

In [None]:
print(df.tail())

First of all we want to know how many rows we have in our dataset.

For this we can use the len() function. When we apply it to our dataset we get the number of rows.

In [None]:
print(len(df))

Now we want to access the data of a specific column. For this we can use the column name as an index. For example: df['infoavail'] will return the column with the name 'infoavail'.

In [None]:
print(df["schoolquality"])

When we have a column we can print out statistics of the data. For example we can print out the minimum, maximum, mean and standard deviation of the data. For this we can use the min(), max(), mean() and std() functions.

In [None]:
schoolquality = df["schoolquality"]
print("Min school quality",schoolquality.min())
print("Max school quality",schoolquality.max())
print("Mean school quality",schoolquality.mean())
print("Standard deviation",schoolquality.std())

### Grouping data

We can also group the data by a column. You can think of it as sorting the data by the values of the column, into different categories. For example we can group the data by the column 'schoolquality'. Then every row with the value 1 will be in one group, every row with the value 2 will be in another group and so on. We can then print out the mean of each group.

In [None]:
mean_police_trust_per_school_quality = df.groupby("schoolquality")["policetrust"].mean()
print(mean_police_trust_per_school_quality)