# <font color="green">Methods, Functions and Attributes</font>


## Introduction to Methods, Functions, and Attributes in Pandas

In the world of data analysis with Python, pandas stands out as one of the most powerful and versatile libraries. It offers a rich ecosystem of functionality designed to make data manipulation and analysis both straightforward and efficient. Central to its design are the concepts of methods, functions, and attributes, each playing a crucial role in the data analysis workflow.

## Understanding the Basics

- **Methods**: These are actions you can perform on pandas objects (like DataFrames and Series). Methods are invoked using dot notation and can perform operations such as filtering, grouping, and transforming data. Examples include `.groupby()`, `.merge()`, and `.sort_values()`.

- **Functions**: In the context of pandas, functions are standalone procedures called by their name and can take one or more arguments, including pandas objects. Functions perform specific tasks on the data and return a result. Common pandas functions include `pd.read_csv()` for reading data into a DataFrame and `pd.to_datetime()` for converting a column to datetime format.

- **Attributes**: Attributes are properties of pandas objects that you can access to gain information about the data's structure or content without modifying the data itself. They are accessed using dot notation. For instance, `DataFrame.shape` returns the dimensions of the DataFrame, and `Series.dtype` gives the data type of the Series.

## The Power of Pandas

Combining methods, functions, and attributes allows for a highly expressive and intuitive approach to data analysis. Whether you're reshaping your data, merging multiple datasets, or simply exploring your data's characteristics, these tools provide the foundation for a robust data analysis pipeline.

In this guide, we will dive deeper into how methods, functions, and attributes are used in pandas to manipulate, analyze, and gain insights from data. Understanding these concepts is key to leveraging the full potential of pandas and becoming proficient in data analysis.


## Key Pandas Functions

Some of the most used pandas functions:



- **DataFrame Creation (`pd.DataFrame(...)`):** The `DataFrame` is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). Creating a `DataFrame` involves structuring your data into this format, allowing for more complex data manipulations.

- **Series Creation (`pd.Series(...)`):** A `Series` is a one-dimensional array with labels. It can hold data of any type (integer, string, float, python objects, etc.). The `Series` creation function is fundamental for working with single-dimensional data, providing a simple yet powerful tool for data analysis.

- **Reading CSV Files (`pd.read_csv(file)`):** This function is used to read a table from a CSV file into a DataFrame. It's one of the most common methods for importing data into pandas for further processing and analysis.

- **Finding Unique Values (`pd.unique(values)`):** `pd.unique` returns the unique values from a one-dimensional array-like object, helping identify distinct values within a dataset. This function is crucial for data cleaning and exploration, allowing analysts to quickly understand the variety of data they are dealing with.

- **Combining Data (`pd.concat([df1, df2])`):** This function is used to concatenate pandas objects along a particular axis. `pd.concat` can be used to combine Series, DataFrame, or Panel objects, providing a flexible way to add new rows or columns to your dataset.

These functions are just the tip of the iceberg when it comes to pandas' capabilities. As you delve deeper into data analysis with pandas, you'll discover a rich ecosystem of tools designed to tackle virtually any data manipulation task.

#### Example: `pd.read_csv()` 

The `pd.read_csv()` function is a fundamental tool in pandas for importing data from CSV files into pandas DataFrames. Here's a closer look at how it works and its significance:

- **Required Parameter**: The primary parameter for `pd.read_csv()` is `filepath_or_buffer`. This argument expects a string that represents the file path or URL to the CSV file you intend to read.

- **Standalone Function**: Unlike methods that need to be called on an object, `pd.read_csv()` is a standalone function. It doesn't depend on any prior object instantiation and can be used directly after importing pandas.

- **Functionality**: At its core, `pd.read_csv()` is designed to take a file path as input and return a DataFrame. This DataFrame represents the data contained within the CSV file, structured in a tabular format that is easy to manipulate and analyze within pandas.

- **Location within Pandas**: This function is part of the pandas library's top-level namespace. Once you've imported pandas (typically as `pd`), `pd.read_csv()` is readily accessible and ready to transform your CSV data into a powerful DataFrame for analysis.

This function is crucial for data scientists and analysts who often work with data stored in CSV format, providing a seamless bridge between raw data files and the robust analytical capabilities of pandas.

In [None]:
import pandas as pd
import pandas

In [None]:
# let's pass a relative filepath to the large_countries_2015.csv 

pd.read_csv('./data/large_countries_2015.csv') 

In [None]:
# Save the dataframe in a variable

df = pd.read_csv('./data/large_countries_2015.csv') 

In [None]:
df

### Other function examples

The **len()** function is a built-in Python function that returns the number of items in an object. When you apply **len()** to a container object (like a list, tuple, string, dictionary, or set), it counts and returns the total number of elements within that object. Here's a breakdown of how it works with different types of objects:

In [None]:


len(df)


In [None]:
# what if we run the function without the parantheses?

len

In [None]:
# save it in also a variable

length_df = len(df)


In [None]:
length_df

Python built-in functions: https://docs.python.org/3/library/functions.html

### Functions related to packages and libraries

In [None]:
# pd.DataFrame() from a dictionary

pd.DataFrame(
    {'a':[1,2,3],
     'b':[4,5,6]
    }
)

In [None]:
# pd.Series() from a list

pd.Series([1,2,3,4,5,6])


**Comments:**
- python has built in functions as we have seen in the functions encounter (len, sum, input, ...)
- most python libraries have functions 
- custom functions can be written by the coders to make their job easier

## 2. Methods

In Python, methods are functions that are associated with objects and can access and modify the state of those objects. Unlike standalone functions, methods are invoked on specific instances of classes, allowing for object-oriented programming practices. Here's a closer look at the characteristics of methods:

- **Association with Objects**: Methods are inherently tied to the objects on which they are called. This association means that a method is always attached to an object's class and is accessed through the object itself. For instance, if you have a list object named `my_list`, you can call its method `append` by using `my_list.append(item)`.

- **Access to Object State**: One of the key features of methods is their ability to interact with the object's state (its attributes or properties). Since methods are part of an object's definition, they have direct access to other attributes and methods of the object. This allows them to read the object's current state, modify it, or even call other methods of the same object to perform operations.

- **Syntax and Calling Convention**: Methods are called by specifying the object name followed by a dot (`.`) and the method name. Any parameters the method requires are passed inside parentheses. This syntax emphasizes the method's connection to the object. For example, `object.method(param1, param2)`.

- **Self Parameter**: In the method definition within a class, the first parameter is often `self`, which is a reference to the instance on which the method is called. This mechanism allows the method to access and manipulate the instance's attributes and call other methods within the same object.

- **Enhances Encapsulation**: By allowing methods to operate on the data within the object they belong to, Python reinforces the principle of encapsulation. This design hides the internal state of an object from the outside world and only exposes operations (methods) that are safe for the object's integrity.

In summary, methods are a fundamental part of Python's approach to object-oriented programming, enabling interactions with and between objects through a well-defined interface of actions that objects can perform or have performed on them.


The difference between a method and a function is that a function is given data to perform a transformation upon and a method performs the transformation on a defined object it is associated with. In this case that object would be a pandas DataFrame.

## Key Pandas Methods

| Command               | Description                                             |
|-----------------------|---------------------------------------------------------|
| `df.to_csv(file)`     | Write a table to disk.                                  |
| `df.sum()`            | Returns the sum of the values over the requested axis.  |
| `df.sort_values()`    | Sorts by the values along either axis.                  |
| `df.count()`          | Returns count of non-NA cells for each column or row.   |
| `df.nunique()`        | Returns the number of unique values in a Series or DataFrame. |
| `df['col'].str.len()` | Returns the length of each string in pandas Series.     |


### Notes:

When applying python string methods to pandas Series the method must be preceded by .str accessor in order for the method to be called correctly
Keep in mind that many but not all pandasmethods can be applied to pandas DataFrames and Series

In our example `df` is the object in question. `df` is a dataframe object. It is a small dataframe and like every object in python is constrained by the architecture of it's datatype

In [None]:
# what is the datatype of df?

type(df)


If the variable name of an object is typed out in jupyter notebook and a period afterwards the user can push the tab key to get a dropdown like menu of all the available methods and attributes. Try it below

In [None]:
df.

In [None]:
pandas.

### Inspect the content of the dataframe

In [None]:
# display the first few rows

df.head()


In [None]:
# display a few random rows

df.sample(4)

Now execute one of the methods on the object such as `.sum()`

In [None]:
# sum the values from the dataframe

df.sum()

In [None]:
'Name'+'Surname'

The `.sum()` method adds up the total of each column. In the cases of the strings it concatenates them together. What is important to see here is that a method performs an opertaion on the object it is associated with.

In [None]:
# what is the overall population?

df['population'].sum()

### String methods 



In [None]:
# get the length of each of the country names

df['continent'].str.len()

In [None]:
# assign this information as a new column

df['continent_len']  =  df['continent'].str.len()

In [None]:
df

### Saving your data to a file

In [None]:
# save your data to a file

df.to_csv('new_table.csv', header=False, index=False)

## 3. Attributes



Attributes in Python play a crucial role in representing the properties or characteristics of an object. They are fundamentally different from methods, focusing on providing information rather than performing actions. Below are the key aspects of attributes, particularly within the pandas library context, which is widely used for data manipulation and analysis.

## Nature of Attributes

- **What They Are**: Attributes are data values tied to a specific object. They encapsulate the properties or state of an object, offering insights into its characteristics without altering it.
- **Example in pandas**: In a pandas DataFrame, an attribute like `shape` indicates the DataFrame's dimensions, providing the number of rows and columns. This insight is essential for understanding the data structure you're working with.

## Accessing Attributes

- **How to Access**: Attributes are accessed using dot notation but, importantly, without parentheses. This distinction from method calls is crucial because attributes are not functions and do not accept parameters.
- **Dot Notation**: Accessing an attribute looks like `object.attribute`, where `object` is your specific Python object (like a DataFrame) and `attribute` is the property you wish to retrieve.

## Example - DataFrame.`shape` Attribute

- **Utility**: The `shape` attribute of a pandas DataFrame returns a tuple representing the DataFrame's size as `(rows, columns)`. It is a direct way to get the DataFrame's dimensions.
- **Usage**: You can access this attribute as `df.shape` for a DataFrame named `df`. This provides a simple yet effective insight into how large the DataFrame is, facilitating further data processing or analysis decisions.

In summary, attributes are a foundational concept in Python, allowing for a clear delineation between the actions an object can perform (methods) and the information it holds (attributes). This distinction is vital in data analysis workflows, especially when working with complex structures like pandas DataFrames.




## Key Pandas attributes

| Command           | Description                                                  |
|-------------------|--------------------------------------------------------------|
| df.shape          | returns tuple representing the dimensionality of the DataFrame |
| df.index          | returns the index as an array-like object                    |
| df.columns        | returns the column index as an array-like object             |
| df.dtypes         | returns the data types in the DataFrame                      |
| df.values         | returns the values of the DataFrame as an array-like object  |
| df.ndim           | returns an integer representing the number of axes            |


Functions, methods and attributes all depend on how the developers of a language or library program and design it. In the case of pandas the developers use Object Oriented Programming and instilled all three options for the users.

In [None]:
# let's describe our dataframe: what is the shape?

df.shape

In [None]:
# what is the size? and how to interpret it?

df.size

In [None]:
# what are the column names?

df.columns

In [None]:
df.info() 
# this is a method though, it performs some functionality 
# on the object DataFrame and returns as summarized information