# .Columns and Column Filtering:
- Calling the .columns attribute of a DataFrame object returns the column names in the form of an Index object.
- *As a reminder, a pandas index is the address/label of the row or column.*
- Syntax: `df.columns`
- It can be converted to a list using a list() function. `list(df.columns)`

In [5]:
import pandas as pd
import csv

# Importing CSV dataset
df = pd.read_csv("studentsperformance.csv")

In [6]:
df.columns

#Converting to list
list(df.columns)

['gender',
 'age',
 'parental_level_of_education',
 'lunch',
 'test_preparation_course',
 'maths_score',
 'reading_score',
 'writing_score']

## .Unique()
- Used to find all the distinct values within a Series (a single column of a DataFrame). 
- It returns a NumPy array of these values in the order they first appeared. 
- Syntax: `df.columnname.unique()` OR `df['Column Name'].unique()`

In [7]:
df['gender'].unique()  #will return names of all Leading Studios
df.parental_level_of_education.unique()


array(["bachelor's degree", 'some college', "master's degree",
       "associate's degree", 'high school', 'some high school'],
      dtype=object)

## Counting using .value_counts()
- Working with categorical values, and you'll want to count the number of observations each category has in a column.
- Category values can be counted using the .value_counts() methods. 
- Sytax: `df.value_counts()`

In [8]:
df.gender.value_counts()

gender
female    518
male      482
Name: count, dtype: int64

#### Columns of Your own choice
- In a scenario of where you want only few columns from dataset instead of entire DataFrame to perform data analysis. 
- Assign a new DataFrame name. Ex: df_new
- Syntax `df[["Name of Column","Name Of Column", "name of column"......]]`

In [9]:
df_new = df[["gender","reading_score","maths_score","writing_score"]]
df_new

Unnamed: 0,gender,reading_score,maths_score,writing_score
0,female,72,72,74
1,female,90,69,88
2,female,95,90,93
3,male,57,47,44
4,male,78,76,75
...,...,...,...,...
995,female,99,88,95
996,male,55,62,55
997,female,71,59,65
998,female,78,68,77


## Filtering data using conditions
- Syntax: `df[df[condition]<>= condition]` OR `df[(df[condition]== condition]) & (df[condition]== condition)]`

In [10]:
df_score = df[(df.gender == 'male') & (df.maths_score == 90)]
df_score

Unnamed: 0,gender,age,parental_level_of_education,lunch,test_preparation_course,maths_score,reading_score,writing_score
299,male,22.0,associate's degree,free/reduced,none,90,87,75
333,male,24.0,associate's degree,standard,none,90,78,81
659,male,24.0,associate's degree,standard,none,90,87,85
808,male,28.0,high school,standard,none,90,75,69
845,male,21.0,master's degree,standard,none,90,85,84
873,male,24.0,associate's degree,free/reduced,none,90,90,82


#### ROW-LEVEL FILTERING"


In [11]:
#Finding all dataset for "bachelor's degree"

#Initial Step: Getting proper name tag of bachelor degree filtering:
df.parental_level_of_education.unique()

array(["bachelor's degree", 'some college', "master's degree",
       "associate's degree", 'high school', 'some high school'],
      dtype=object)

In [12]:
#Final Step = filtering out "bachelor's degree"
df[df.parental_level_of_education == "bachelor's degree"]

Unnamed: 0,gender,age,parental_level_of_education,lunch,test_preparation_course,maths_score,reading_score,writing_score
0,female,28.0,bachelor's degree,standard,none,72,72,74
24,male,24.0,bachelor's degree,free/reduced,completed,74,71,80
27,female,21.0,bachelor's degree,standard,none,67,69,75
60,male,30.0,bachelor's degree,free/reduced,completed,79,74,72
77,male,30.0,bachelor's degree,standard,completed,80,78,81
...,...,...,...,...,...,...,...,...
916,male,24.0,bachelor's degree,standard,completed,100,100,100
933,male,24.0,bachelor's degree,free/reduced,completed,70,75,74
969,female,25.0,bachelor's degree,standard,none,75,84,80
970,female,25.0,bachelor's degree,standard,none,89,100,100
