### __Pandas__

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

Pandas allows us to analyze big data and make conclusions based on statistical theories.

Pandas can clean messy data sets, and make them readable and relevant.

Relevant data is very important in data science.


If you have Python and PIP already installed on a system, then installation of Pandas is very easy.

Install it using this command:

C:\Users\Your Name>pip install pandas

C:\Users\Your Name>pipenv install pandas


##### __Import Pandas__

Once Pandas is installed, import it in your applications by adding the import keyword

In [1]:

import pandas

mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}

myvar = pandas.DataFrame(mydataset)

print(myvar)

    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2


##### _Pandas as pd_

Pandas is usually imported under the pd alias.

In Python alias are an alternate name for referring to the same thing.

Create an alias with the "as" keyword while importing

In [2]:
import pandas as pd

mydataset = {
  'cars': ["BMW", "Volvo", "Ford"],
  'passings': [3, 7, 2]
}

myvar = pd.DataFrame(mydataset)

print(myvar)

    cars  passings
0    BMW         3
1  Volvo         7
2   Ford         2


##### __Pandas Series__

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

In [3]:
import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)

0    1
1    7
2    2
dtype: int64


##### _Labels_

If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.

This label can be used to access a specified value.

In [4]:
print(myvar[0])

1


With the "index" argument, you can name your own labels.

In [6]:
import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a, index = ["x", "y", "z"])

print(myvar)

print()

print(myvar["y"])

x    1
y    7
z    2
dtype: int64

7


##### _Key/Value Objects as Series_

You can also use a key/value object, like a dictionary, when creating a Series.

In [7]:
import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)

day1    420
day2    380
day3    390
dtype: int64


To select only some of the items in the dictionary, use the index argument and specify only the items you want to include in the Series.

In [11]:
import pandas as pd

calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories, index = ["day1", "day2"])

print(f"{myvar} \n")
print(type(myvar))

day1    420
day2    380
dtype: int64 

<class 'pandas.core.series.Series'>


##### __Pandas DataFrame__

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

DataFrame(_data, columns_)

##### _DataFrame Attributes_

In the context of pandas, attributes are properties of a DataFrame object that provide information about the data it contains.

Attributes in pandas allow you to access general information about a dataset without performing any data manipulation. They provide a convenient way to get an overview of the DataFrame's structure and contents.

Pandas Data types

------------ Python --- Pandas

string------ str------- object

integer num- int------- int64

float num--- float-----	float64

logic data-- bool------	bool

In [31]:
import pandas as pd
import os

print(f"{os.getcwd()} \n")

df = pd.read_csv('DataSets/music_log.csv')

print(f"dtypes: \n {df.dtypes} \n") # Returns the data types of each column.
print(f"index: \n {df.index} \n") # Returns the row labels as an Index object.
print(f"columns: \n {df.columns} \n")# Returns the column labels as an Index object.
print(f"shape: \n {df.shape} \n") # Returns the dimensions of the DataFrame as a tuple (rows, columns)
print(f"size: \n {df.size} \n") # Returns the total number of elements (rows × columns).
print(f"ndim: \n {df.ndim} \n") # Returns the number of dimensions (always 2 for a DataFrame).
print(f"empty: \n {df.empty} \n") # Returns True if the DataFrame is empty.
print(f"values: \n {df.values} \n") # Returns the underlying NumPy array representation.
print(f"axes: \n {df.axes} \n") # Returns a list of row and column index labels.
print(f"T: \n {df.T} \n") # Returns the transpose of the DataFrame.

c:\Users\luisp\OneDrive\Documentos\GitHub\Python-VENV\Python tutorial 

dtypes: 
   user_id      object
total play    float64
Artist         object
genre          object
track          object
dtype: object 

index: 
 RangeIndex(start=0, stop=67963, step=1) 

columns: 
 Index(['  user_id', 'total play', 'Artist', 'genre', 'track'], dtype='object') 

shape: 
 (67963, 5) 

size: 
 339815 

ndim: 
 2 

empty: 
 False 

values: 
 [['BF6EA5AF' 92.85138808302445 'Marina Rei' 'pop' 'Musica']
 ['FB1E568E' 282.981 'Stive Morgan' 'ambient' 'Love Planet']
 ['FB1E568E' 282.981 'Stive Morgan' 'ambient' 'Love Planet']
 ...
 ['26B7058C' 292.455 'Red God' 'metal' 'Действуй!']
 ['DB0038A8' 11.529112451445515 'Less Chapell' 'pop' 'Home']
 ['FE8684F6' 0.1 nan nan nan]] 

axes: 
 [RangeIndex(start=0, stop=67963, step=1), Index(['  user_id', 'total play', 'Artist', 'genre', 'track'], dtype='object')] 

T: 
                  0             1             2                    3      \
  user_id     BF6EA5AF    

##### _DataFrame from a List_

In [33]:
import pandas as pd

atlas = [
      ['France', 'Paris'],  
        ['Russia', 'Moscow'],  
        ['China', 'Beijing'],  
        ['Mexico', 'Mexico City'],  
        ['Egypt', 'Cairo'],
]
geography = ['country', 'capital']

world_map = pd.DataFrame(data=atlas , columns=geography)

print(world_map)

  country      capital
0  France        Paris
1  Russia       Moscow
2   China      Beijing
3  Mexico  Mexico City
4   Egypt        Cairo


##### _DataFrame from a Dictionary_

In [35]:
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45],
  "type": ["carbs", "glucose", "sugar"]
}

df = pd.DataFrame(data)

print(df) 

   calories  duration     type
0       420        50    carbs
1       380        40  glucose
2       390        45    sugar


##### _DataFrame from a File_

In [None]:
import pandas as pd

df = pd.read_csv('DataSets/mouse_growth_rate.csv') # Read .csv or .xlsx or xls or .xlsm or xlsb or xltx file.

print(df) 

print()

print(pd.options.display.max_rows) # You can check your system's maximum rows with the pd.options.display.max_rows statement. Default is 60.

   age  mouse1  mouse2
0    1      24      18
1    2      56      36
2    3      64      50
3    4      82      68
4    5      92      72
5    6      94      72
6    7      88      74

60


##### _DataFrame Methods()_

In [45]:
import pandas as pd

df = pd.read_csv('DataSets/music_log.csv')

df.info() # Displays a summary of the DataFrame, including data types and memory usage.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67963 entries, 0 to 67962
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0     user_id   67963 non-null  object 
 1   total play  67399 non-null  float64
 2   Artist      59646 non-null  object 
 3   genre       64661 non-null  object 
 4   track       64804 non-null  object 
dtypes: float64(1), object(4)
memory usage: 2.6+ MB


In [46]:
import pandas as pd

df = pd.read_csv('DataSets/music_log.csv')

df.head() # Returns the first n rows (default is 5).

Unnamed: 0,user_id,total play,Artist,genre,track
0,BF6EA5AF,92.851388,Marina Rei,pop,Musica
1,FB1E568E,282.981,Stive Morgan,ambient,Love Planet
2,FB1E568E,282.981,Stive Morgan,ambient,Love Planet
3,EF15C7BA,8.966,,dance,Loving Every Minute
4,82F52E69,193.776327,Rixton,pop,Me And My Broken Heart


In [47]:
import pandas as pd

df = pd.read_csv('DataSets/music_log.csv')

df.tail(10) #  Returns the last n rows.

Unnamed: 0,user_id,total play,Artist,genre,track
67953,A06381D8,2.502,Flip Grater,folk,My Old Shoes
67954,6E8E430E,139.627717,Alt & J,trance,Emotion
67955,D83CBA77,185.0,TKN,rock,Не отступай
67956,816FBC10,2.0,89ers,dance,Go Go Go
67957,18510741,109.0,Steel Pulse,reggae,Chant A Psalm
67958,2E27DF51,220.551837,Nadine Coyle,pop,Girls On Fire
67959,4F29D4D5,26.127,Digital Hero,dance,The Model
67960,26B7058C,292.455,Red God,metal,Действуй!
67961,DB0038A8,11.529112,Less Chapell,pop,Home
67962,FE8684F6,0.1,,,


In [48]:
import pandas as pd

df = pd.read_csv('DataSets/music_log.csv')

# df.memory_usage() # Returns the memory usage of each column in bytes.
df.memory_usage(index=True, deep=True) # Returns the memory usage of each column in bytes, including the index.

Index             132
  user_id     3937800
total play     543704
Artist        4103453
genre         3656054
track         4503010
dtype: int64

In [49]:
import pandas as pd

df = pd.read_csv('DataSets/music_log.csv')

df.describe() # Returns a summary of statistics for numerical columns. 

Unnamed: 0,total play
count,67399.0
mean,98.899155
std,144.460713
min,0.0
25%,2.019
50%,20.135056
75%,194.335
max,8638.736


##### _DataFrame Indexing_