# Pandas


What Is Pandas In Python?
Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks. It is built on top of another package named Numpy, which provides support for multi-dimensional arrays. As one of the most popular data wrangling packages, Pandas works well with many other data science modules inside the Python ecosystem, and is typically included in every Python distribution

What Can You Do With DataFrames Using Pandas?
Pandas makes it simple to do many of the time consuming, repetitive tasks associated with working with data, including:

Data cleansing

Data fill

Data normalization

Merges and joins

Data visualization

Statistical analysis

Data inspection

Loading and saving data

And much more

In fact, with Pandas, you can do everything that makes world-leading data scientists vote Pandas as the best data analysis 
and manipulation tool available.

In [None]:
# pip install pandas
import pandas as pd

In [None]:
games = pd.read_csv("vgsalesGlobal.csv", index_col = "Name") # read_csv is a function that reads in the data from a CSV file

In [None]:
games.head() # head() shows the first 5 rows of the dataset

In [None]:
games.tail(10) # tail shows the last x rows of the dataset, the number specified is how may rows is shown


In [None]:
len(games) # length of the dataset

In [None]:
games.shape  # length and number of columns

In [None]:
games.dtypes # What each type of each column is 

In [None]:
games.iloc[299] # iloc founds specified row 

In [None]:
games.loc["Super Mario Bros."] # loc find the row based on the name given

In [None]:
games.head()

In [None]:
games.sort_values(by = "Year").head() # sort by value year and show first 5 values

In [None]:
games.sort_values(by = "Year", ascending=False).head() # you can pass through as ascending parameter and change the value

In [None]:
games.sort_values(by = ["Year", "Genre"], ascending=False).head()  # sort by multiple values, if sorting on string column then it's sorted alphabetically


In [None]:
games.sort_index().head() # sorted by name since that's the index

In [None]:
games["Publisher"].head(10) # get names and publishers for the first 10 rows

In [None]:
games[games["Genre"] =="Action"] ] # get the rows in which genre is action

In [None]:
games_by_genre = games["Genre"] =="Action"  # assign variable value games_by_genre to games that are action , then display all rows that have it assigned
games[games_by_genre]

In [None]:
games_in_2010 = games["Year"] == 2010  # similar to previous cell but now both values are used to display
games[games_by_genre & games_in_2010]

In [None]:
games[games_by_genre | games_in_2010]

In [None]:
after_2015 = games["Year"] > 2015 # display games after 2015
games[after_2015]

In [None]:
mid_2000s = games["Year"].between(2000, 2010) # between 2000 and 2010
games[mid_2000s]

In [None]:
sport_in_title = games.index.str.lower().str.contains("sport") # convert index to lower case string and contains sports and display them
games[sport_in_title]

In [None]:
games["Global_Sales"].mean() # pretty simple, displays mean

In [None]:
genres = games.groupby("Genre") # group by Genre

In [None]:
genres["Global_Sales"].sum() # sum of all sales grouped by genre

In [None]:
genres["Global_Sales"].sum().sort_values(ascending = False) # sort the values in decreasing order