# **Main commands to perform a basic review of a DataFrame in pandas**

# **df.shape**

Description: Returns the shape of the DataFrame as a tuple, where the first value is the number of rows and the second is the number of columns.

Usage: Useful to know how many records and variables are in your DataFrame.

In [None]:
print(df.shape)

# **df.head(n)**

Description: Shows the first n rows of the DataFrame (by default shows the first 5 rows).

Useful for getting a quick overview of the first records.
Usage: To check the structure of the initial data in the DataFrame.

In [None]:
print(df.head(2))  # Show the first 2 rows

# **df.tail(n)**

Description: Shows the last n rows of the DataFrame (by default shows the last 5 rows). Useful to view the last records in the DataFrame.

Usage: To review the final records in the DataFrame.

In [None]:
print(df.tail(2))  # Show the last 2 rows

# **df.info()**

Description: Provides a summary of the DataFrame, including the number of entries (rows), data types of each column, and how many non-null values each column has.

Usage: Useful to understand the structure of the data, the type of each column, and if there are missing values.

In [None]:
print(df.info())

# **df.describe()**

Description: Provides a statistical summary of the numerical columns in the DataFrame, such as mean, standard deviation, minimum, percentiles, etc.

Usage: To get basic statistics for the numerical variables.

In [None]:
print(df.describe())

# **df.columns**

Description: Returns a list of the column names of the DataFrame.

Usage: Useful to review or rename the columns of the DataFrame.

In [None]:
print(df.columns)

# **df.dtypes**

Description: Displays the data types of each column in the DataFrame.

Usage: Useful to know the data type of each column, like int64, float64, object, etc.

In [None]:
print(df.dtypes)

# **df.isnull().sum()**

Description: Returns the number of missing values per column.

Usage: Useful to identify how many missing values there are in each column.

In [None]:
print(df.isnull().sum())

# **df.value_counts()**

Description: Returns a count of unique values in a specific column.

Usage: Useful to see how often each value occurs in a column.

In [None]:
print(df['City'].value_counts())

# **df.sample(n)**

Description: Returns a random sample of n rows from the DataFrame.

Usage: Useful to review a random sample of the data.

In [None]:
print(df.sample(2))  # Show 2 random rows

# **df.index.is_unique**

Description: The is_unique property of a DataFrame's index checks whether the index contains only unique values. It returns a boolean value: True if all index values are unique and False if there are any duplicate index values. This is useful for ensuring that each row can be uniquely identified by its index, which is important for certain operations, such as merging or joining DataFrames.

Usage: You can use df.index.is_unique to quickly verify the uniqueness of the index in your DataFrame.

In [None]:
is_unique = df_unique.index.is_unique
print("Is the index unique?", is_unique)

# **df.set_index()**

Description: Is used to set one or more columns of a DataFrame as the index. This can be useful for various operations, such as making the DataFrame easier to work with when querying or filtering data. Setting an index can help enhance performance when accessing data and improve the readability of your DataFrame.

When you set a column as the index, it will no longer be a column in the DataFrame but will instead become the row labels.

Usage: You can use df.set_index(keys) to specify the column(s) you want to use as the index. The keys parameter can take a single column name or a list of column names.

In [None]:
df = df.set_index('nombre_de_la_columna')

# **df.index**

Description: The index attribute of a pandas DataFrame returns the index (row labels) of the DataFrame. It provides access to the labels that are used to identify each row. The index can be a simple integer index, a date-time index, or even a custom index based on one or more columns. This attribute allows you to inspect, manipulate, and perform various operations on the index.

Usage: You can use df.index to retrieve the index of the DataFrame. It can also be modified or reassigned if needed.

In [None]:
print(df.index)

# **df.index.to_series().diff().dt.total_seconds()**

Description: This command calculates the diferences in seconds between each pair of data.

Usage: To verify that the frequency of the samples is the desired one.

In [None]:
df_time_diffs = df.index.to_series().diff().dt.total_seconds()
print(df_time_diffs.value.counts())

Summary of the Commands:

*   df.shape: Returns the number of rows and columns.
*   df.head(n): Displays the first n rows.
*   df.tail(n): Displays the last n rows.
*   df.info(): Provides a summary of the structure of the DataFrame.
*   df.describe(): Returns descriptive statistics for numerical columns.
*   df.columns: Shows the names of the columns.
*   df.dtypes: Displays the data types of each column.
*   df.isnull().sum(): Returns the number of missing values per column.
*   df.value_counts(): Displays the count of unique values in a column.
*   df.sample(n): Returns a random sample of n rows.
















