In [70]:
import pandas as pd

In [72]:
Data = {
    'first':['John','jane','doe'],
    'last':['smith','nolan','harry'],
     'email':['john@gmail.com','jane@gmail.com','doe@gamail.com']
}

In [74]:
df = pd.DataFrame(Data)

In [76]:
df

Unnamed: 0,first,last,email
0,John,smith,john@gmail.com
1,jane,nolan,jane@gmail.com
2,doe,harry,doe@gamail.com


Here, a dictionary named Data is created. This dictionary contains three keys: 'first', 'last', and 'email'. Each key maps to a list of strings representing first names, last names, and email addresses, respectively.

In [79]:
type(df['email'])

pandas.core.series.Series

The type() function is a built-in Python function that returns the type of the specified object. In this case, type(df['email']) will return the type of the 'email' column in the DataFrame df.


In [82]:
df[['email','first']]

Unnamed: 0,email,first
0,john@gmail.com,John
1,jane@gmail.com,jane
2,doe@gamail.com,doe


In [84]:
df.iloc[0]

first              John
last              smith
email    john@gmail.com
Name: 0, dtype: object

The selected code `df.iloc[0]` is used to access the first row of a DataFrame `df` using integer-location based indexing. Here's a step-by-step explanation:


In [87]:
df.loc[0]

first              John
last              smith
email    john@gmail.com
Name: 0, dtype: object

In [89]:
df.loc[[0,1],['email']]

Unnamed: 0,email
0,john@gmail.com
1,jane@gmail.com


df.loc[[0,1],['email']]` is used to access specific rows and columns from a DataFrame

In [92]:
df['email'].value_counts()

email
john@gmail.com    1
jane@gmail.com    1
doe@gamail.com    1
Name: count, dtype: int64

value_counts()` - This method is called on the Series obtained from the 'email' column. It counts the occurrences of each unique value (email address) in the column and returns a new Series with the unique email addresses as the index and their corresponding counts as the values.


In [95]:
df.loc[0:2,'email']

0    john@gmail.com
1    jane@gmail.com
2    doe@gamail.com
Name: email, dtype: object

In [97]:
df.loc[0:1]

Unnamed: 0,first,last,email
0,John,smith,john@gmail.com
1,jane,nolan,jane@gmail.com


In [99]:
df['last'] == 'smith'

0     True
1    False
2    False
Name: last, dtype: bool

In [101]:
df[df['last'] == 'smith']

Unnamed: 0,first,last,email
0,John,smith,john@gmail.com


In [103]:
x = df['last'] == 'smith'
df.loc[x]

Unnamed: 0,first,last,email
0,John,smith,john@gmail.com


In [105]:
df.columns = ['first_name','last_name','email']

In [107]:
df

Unnamed: 0,first_name,last_name,email
0,John,smith,john@gmail.com
1,jane,nolan,jane@gmail.com
2,doe,harry,doe@gamail.com


['first_name','last_name','email']` is used to rename the columns of a DataFrame

In [110]:
df.columns = [df.columns.str.replace('_',' ')]

In [112]:
df

Unnamed: 0,first name,last name,email
0,John,smith,john@gmail.com
1,jane,nolan,jane@gmail.com
2,doe,harry,doe@gamail.com


1. `df.columns.str.replace('_',' ')` - This uses the `str.replace()` method to replace all underscores (`_`) in the column names with spaces (` `). The `str` accessor allows for string operations on each element of the index.

2. `[ ... ]` - The result of the `str.replace()` operation is wrapped in square brackets, which converts the output into a list.

3. `df.columns = ...` - Finally, the modified list of column names (with underscores replaced by spaces) is assigned back to `df.columns`, effectively updating the DataFrame's column names. 

In [130]:
df.loc[3,['first name','last name','email']] = ['jack','oliver','jack@gmail.com']

In [132]:
df

Unnamed: 0,first name,last name,email
0,John,smith,john@gmail.com
1,jane,nolan,jane@gmail.com
2,doe,harry,doe@gamail.com
3,jack,oliver,jack@gmail.com



1. `df.loc[3, ...]` - This part uses the `.loc` indexer to access a specific row in the DataFrame `df`. The number `3` indicates that it is targeting the row with index 3.

2. `['first name','last name','email']` - This list specifies the columns that will be updated in the selected row. It indicates that the values for the columns 'first name', 'last name', and 'email' will be modified.

3. `= ['jack','oliver','jack@gmail.com']` - This assignment sets the values of the specified columns in row 3 to 'jack' for 'first name', 'oliver' for 'last name', and 'jack@gmail.com' for 'email'.

In [158]:
df['last name'].replace({'harry': 'daniel'})

Unnamed: 0,last name
0,smith
1,nolan
2,daniel
3,oliver



1. `df['last name']` - This accesses the column named 'last name' in the DataFrame `df`.

2. `.replace({'harry': 'daniel'})` - This method is called on the 'last name' column. It replaces occurrences of the string 'harry' with 'daniel' in that column.

3. The result of this operation is a new Series with the specified replacements made, but it does not modify the original DataFrame `df` unless the result is assigned back to a column or the DataFrame itself. 

In [161]:
df.columns = ['first','last','email']

In [165]:
df['full name'] = df['first'] + ' ' + df['last']

1. **Column Creation**: It creates a new column in the DataFrame `df` called `'full name'`.

2. **String Concatenation**: The new column is populated by concatenating the values from the existing columns `'first'` and `'last'`. 

3. **Space Separator**: The string `' '` (a space) is added between the first and last names to ensure that the full name is formatted correctly.

In [168]:
df

Unnamed: 0,first,last,email,full name
0,John,smith,john@gmail.com,John smith
1,jane,nolan,jane@gmail.com,jane nolan
2,doe,harry,doe@gamail.com,doe harry
3,jack,oliver,jack@gmail.com,jack oliver


In [170]:
df.drop(columns = ['first','last'],inplace=True)

1. `df.drop(...)` - This method is called on the DataFrame `df` to remove specified columns.

2. `columns = ['first', 'last']` - This argument specifies the list of column names to be dropped from the DataFrame. In this case, the columns named 'first' and 'last' will be removed.

3. `inplace=True` - This parameter indicates that the operation should modify the original DataFrame `df` directly, rather than returning a new DataFrame with the specified columns removed. If set to `False`, a new DataFrame would be returned, and the original `df` would remain unchanged

In [173]:
df

Unnamed: 0,email,full name
0,john@gmail.com,John smith
1,jane@gmail.com,jane nolan
2,doe@gamail.com,doe harry
3,jack@gmail.com,jack oliver


In [175]:
df[['first','last']] = df['full name'].str.split(' ',expand=True)

1. **Accessing the 'full name' column**: `df['full name']` retrieves the column named 'full name' from the DataFrame `df`.

2. **Splitting the string**: The `.str.split(' ', expand=True)` method is called on the 'full name' column. This method splits each string in the column at the space character (' '), creating separate components for each part of the name.

3. **Expanding the result**: The `expand=True` argument ensures that the split results are returned as separate columns in a DataFrame rather than as a Series of lists.

4. **Assigning to new columns**: The result of the split operation is assigned to two new columns in the original DataFrame `df`, named 'first' and 'last'. This effectively creates two new columns that contain the first and last names extracted from the 'full name' column.






In [178]:
df

Unnamed: 0,email,full name,first,last
0,john@gmail.com,John smith,John,smith
1,jane@gmail.com,jane nolan,jane,nolan
2,doe@gamail.com,doe harry,doe,harry
3,jack@gmail.com,jack oliver,jack,oliver


In [182]:
df.drop(columns = ['full name'], inplace = True)

In [184]:
df

Unnamed: 0,email,first,last
0,john@gmail.com,John,smith
1,jane@gmail.com,jane,nolan
2,doe@gamail.com,doe,harry
3,jack@gmail.com,jack,oliver


In [188]:
new = pd.DataFrame([{'first':'levi','last':'lucas','email':'levi@gmail.com'}])
df = pd.concat([df,new],ignore_index = True)

1. `new = pd.DataFrame([{'first':'levi','last':'lucas','email':'levi@gmail.com'}])` - This line creates a new DataFrame named `new` using the `pd.DataFrame()` constructor. It contains a single row with three columns: 'first', 'last', and 'email', populated with the values 'levi', 'lucas', and 'levi@gmail.com', respectively.

2. `df = pd.concat([df,new], ignore_index=True)` - This line concatenates the existing DataFrame `df` with the newly created DataFrame `new`. 
   - `pd.concat()` is used to combine the two DataFrames.
   - The first argument is a list containing the DataFrames to concatenate: `[df, new]`.
   - `ignore_index=True` ensures that the resulting DataFrame will have a new integer index that runs from 0 to n-1, where n is the total number of rows in the concatenated DataFrame. This prevents index duplication from the original DataFrames.

In [191]:
df

Unnamed: 0,email,first,last
0,john@gmail.com,John,smith
1,jane@gmail.com,jane,nolan
2,doe@gamail.com,doe,harry
3,jack@gmail.com,jack,oliver
4,levi@gmail.com,levi,lucas


In [199]:
df[df['last'] == 'lucas'].index

Index([4], dtype='int64')

In [203]:
df.drop(index = df[df['last'] == 'lucas'].index, inplace=True)


1. `df['last'] == 'lucas'` - This creates a boolean Series that checks each row in the 'last' column of the DataFrame `df` to see if it equals 'lucas'. It returns `True` for rows where the condition is met and `False` otherwise.

2. `df[df['last'] == 'lucas']` - This filters the DataFrame `df` to include only the rows where the condition is `True`, effectively selecting all rows where the 'last' column has the value 'lucas'.

3. `.index` - This retrieves the index labels of the filtered DataFrame, which corresponds to the rows where 'last' is 'lucas'.

4. `df.drop(index = ..., inplace=True)` - The `drop()` method is called on the original DataFrame `df`. The `index` parameter is set to the indices obtained in the previous step, meaning it will remove those rows from `df`. The `inplace=True` argument modifies the original DataFrame directly, rather than returning a new DataFrame with the rows removed. 

In [206]:
df

Unnamed: 0,email,first,last
0,john@gmail.com,John,smith
1,jane@gmail.com,jane,nolan
2,doe@gamail.com,doe,harry
3,jack@gmail.com,jack,oliver


In [208]:
df.sort_values(by='last')

Unnamed: 0,email,first,last
2,doe@gamail.com,doe,harry
1,jane@gmail.com,jane,nolan
3,jack@gmail.com,jack,oliver
0,john@gmail.com,John,smith


In [210]:
df['last'].count()

4

1. `df` - This refers to a Pandas DataFrame, which is a two-dimensional labeled data structure commonly used for data manipulation and analysis.

2. `['last']` - This part accesses the column named 'last' in the DataFrame `df`. It returns a Series containing all the values from that specific column.

3. `.count()` - This method is called on the Series obtained from the 'last' column. It counts the number of non-null (non-NaN) entries in that column.

In [215]:
df.dropna()

Unnamed: 0,email,first,last
0,john@gmail.com,John,smith
1,jane@gmail.com,jane,nolan
2,doe@gamail.com,doe,harry
3,jack@gmail.com,jack,oliver


1. `df` - This refers to a Pandas DataFrame, which is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure.

2. `.dropna()` - This method is called on the DataFrame `df`. It identifies and removes any rows that have at least one missing value (NaN) in any of their columns.

3. By default, `dropna()` returns a new DataFrame with the rows containing NaN values removed, but it does not modify the original DataFrame unless the `inplace=True` parameter is specified.

In [218]:
df.dropna(axis='index',how ='all')

Unnamed: 0,email,first,last
0,john@gmail.com,John,smith
1,jane@gmail.com,jane,nolan
2,doe@gamail.com,doe,harry
3,jack@gmail.com,jack,oliver


1. `df` - This refers to the DataFrame from which you want to drop rows.

2. `.dropna()` - This is a method of the DataFrame that is used to remove missing values.

3. `axis='index'` - This argument specifies that the operation should be performed along the index (rows). If it were set to `axis=1`, it would operate on columns instead.

4. `how='all'` - This argument indicates that a row should be dropped only if all of its values are NaN. If any value in the row is not NaN, the row will be retained

In [221]:
df.isna()

Unnamed: 0,email,first,last
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False



1. `df` - This refers to a pandas DataFrame that contains your data.

2. `.isna()` - This method is called on the DataFrame. It returns a new DataFrame of the same shape as `df`, where each element is a boolean value: `True` if the corresponding element in `df` is missing (NaN), and `False` if it is not.

In [224]:
df['email'].unique()

array(['john@gmail.com', 'jane@gmail.com', 'doe@gamail.com',
       'jack@gmail.com'], dtype=object)


1. `df` - This refers to a pandas DataFrame that contains various data, including an 'email' column.

2. `['email']` - This part accesses the 'email' column of the DataFrame. It returns a Series containing all the values in that column.

3. `.unique()` - This method is called on the Series obtained from the 'email' column. It returns an array of the unique values present in that column, eliminating any duplicates