## How to Interact with this Jupyter Notebook

In this activity, you will use a Jupyter Notebook, which integrates both text and code. The gray boxes contain executable code, which you will run in order to view its output. The text in between the code provides instructions.

## Scenario: Charting the Customer Journey with Pandas

Imagine you're a Python developer at a rapidly growing e-commerce company. The marketing team is eager to understand customer behavior and preferences to tailor their campaigns and improve the overall shopping experience. They've provided you with a valuable dataset containing information about customers, their purchases, and demographics. 

Your task is to leverage your Python skills and the power of the Pandas library to load this dataset, explore its structure, and uncover preliminary insights that will guide further analysis. This initial exploration is crucial for understanding the data you're working with and making informed decisions about how to proceed with more in-depth analysis and visualization.

In the cell below, begin by importing the `pandas` library with the alias `pd`. Then, use `.read_csv()` to load the `customer_data_50.csv` file into a DataFrame named `customer_data`. 

Lastly, run the cell.

In [1]:
# Import the pandas library with the alias 'pd'

# insert code here 
import pandas as pd

# Load the CSV file 'customer_data_50.csv' into a DataFrame

# insert code here 
customer_data = pd.read_csv('customer_data_50.csv')

Run the following cell, which will check the dimensions of your DataFrame using the `.shape` attribute. This tells you how many rows and columns your data has – kind of like figuring out the size of a spreadsheet!

In [2]:
# Display the shape of the DataFrame (rows, columns)
print("\nShape of the DataFrame (rows, columns):", customer_data.shape)


Shape of the DataFrame (rows, columns): (50, 13)


Next, you'll inspect the data using the `df.head()` function, which allows you to view the first few rows of the DataFrame. This gives you a quick look at the data's structure and content.

In the cell below, use `df.head()`to display the first 5 rows of the `customer_data` DataFrame.  Then, run the cell and take a moment to observe the output. 

In [4]:
# Display the first 5 rows
print("First 5 rows:\n")

# insert code here 
display(customer_data.head()) #delete this adeeb

First 5 rows:



Unnamed: 0,customer_id,first_name,last_name,email,gender,age,city,state,country,purchase_count,total_spend,avg_order_value,last_purchase_date
0,1001,Sophia,Smith,sophia.smith@example.com,M,54,San Antonio,TX,USA,5,965,193.0,2023-09-12 15:28:32.140488
1,1002,Joseph,Smith,joseph.smith@example.com,M,66,Los Angeles,CA,USA,7,1246,178.0,2023-08-25 15:28:32.140488
2,1003,John,Anderson,john.anderson@example.com,F,56,Phoenix,AZ,USA,1,199,199.0,2024-04-28 15:28:32.140488
3,1004,Emma,Hernandez,emma.hernandez@example.com,M,44,Los Angeles,CA,USA,14,3752,268.0,2024-01-01 15:28:32.140488
4,1005,Emily,Garcia,emily.garcia@example.com,F,25,Dallas,TX,USA,12,1620,135.0,2023-12-09 15:28:32.140488


Now, you'll use the `df.info()` function, which provides a concise summary of the DataFrame, including the column names, their data types, and the number of non-null values.

In the cell below, use `df.info()` to print information about the `customer_data` DataFrame  Then, run the cell and take a moment to observe the output. 

In [5]:
# Print the column names and their data types
print("\nColumn names and their data types:\n")

# insert code here 
display(customer_data.info()) 


Column names and their data types:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 13 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   customer_id         50 non-null     int64  
 1   first_name          50 non-null     object 
 2   last_name           50 non-null     object 
 3   email               50 non-null     object 
 4   gender              50 non-null     object 
 5   age                 50 non-null     int64  
 6   city                50 non-null     object 
 7   state               50 non-null     object 
 8   country             50 non-null     object 
 9   purchase_count      50 non-null     int64  
 10  total_spend         50 non-null     int64  
 11  avg_order_value     50 non-null     float64
 12  last_purchase_date  50 non-null     object 
dtypes: float64(1), int64(4), object(8)
memory usage: 5.2+ KB


None

Next, you'll use the `df.describe()` function, which generates descriptive statistics for the numerical columns in the DataFrame.

In the cell below, use `df.describe()` to display summary statistics for the numerical columns in the `customer_data` DataFrame.

In [7]:
# Display descriptive statistics for numerical columns
print("\nDescriptive statistics for numerical columns:\n")

# insert code here 
display(customer_data.describe())


Descriptive statistics for numerical columns:



Unnamed: 0,customer_id,age,purchase_count,total_spend,avg_order_value
count,50.0,50.0,50.0,50.0,50.0
mean,1025.5,43.44,8.6,1491.88,179.92
std,14.57738,14.833993,4.28095,968.697666,70.820221
min,1001.0,19.0,1.0,199.0,53.0
25%,1013.25,30.0,5.0,819.0,125.75
50%,1025.5,44.5,8.0,1350.0,180.0
75%,1037.75,54.0,12.0,1916.0,237.5
max,1050.0,69.0,15.0,4440.0,299.0


Finally, in the code cell below, you'll use the `.mean()` and `.median()` functions on the `'age'` column of your `customer_data` to calculate the average and median age of all your customers. 

The square brackets [] are used for column selection in Pandas. Within the brackets, you specify the name of the column you want to extract, which in this case is 'age'

Run the cell to see the average and median age of your customers.

In [10]:
# Calculate the mean of the 'age' column
mean_age = customer_data['age'].mean() # insert code here 

# Print the mean age
print("\nMean Age:", mean_age)

# Calculate the median of the 'age' column
median_age = customer_data['age'].median() # insert code here 

# Print the median age
print("\nMedian Age:", median_age)


Mean Age: 43.44

Median Age: 44.5


## Activity Recap: Charting the Customer Journey with Pandas

Congratulations! In this activity, you learned how to load a CSV file into a Pandas DataFrame and use various functions to inspect its structure and contents:

* `pd.read_csv()` is used to load CSV data into a DataFrame.
* `df.head()` shows the first few rows.
* `df.info()` provides a summary of the DataFrame's structure.
* `df.describe()` generates descriptive statistics for numerical columns.

In [1]:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 28]}
pd.DataFrame(data)

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,28


In [3]:
data = [['Alice', 25], ['Bob', 30], ['Charlie', 28]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
pd.DataFrame(data, columns=['Name', 'Age'])

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,28


In [7]:
df.query('Age>25')

Unnamed: 0,Name,Age
1,Bob,30
2,Charlie,28


In [8]:
type(df.query('Age>25'))

pandas.core.frame.DataFrame

In [9]:
df.query('Age>25')['Age']

1    30
2    28
Name: Age, dtype: int64

In [10]:
df = pd.DataFrame({
    'Sequences': [
        [1, 4, 5, 8],
        [9, 7, 6,  1, 0],
        [8, 9, 3],
        [5, 5, 5, 5, 5, 5],
        [-19, -8, 0, 5, 8, 10],
        [19, 8, 0, 5, 8, 10],
        [19, 8, 0, -5, -8, -10, -12, -19]]
})
df

Unnamed: 0,Sequences
0,"[1, 4, 5, 8]"
1,"[9, 7, 6, 1, 0]"
2,"[8, 9, 3]"
3,"[5, 5, 5, 5, 5, 5]"
4,"[-19, -8, 0, 5, 8, 10]"
5,"[19, 8, 0, 5, 8, 10]"
6,"[19, 8, 0, -5, -8, -10, -12, -19]"


In [45]:
def lst_order(lst: list[int]) -> str:
    order = []
    for idx in range(len(lst)-1):
        if lst[idx]>lst[idx+1]:
            order.append('D')
        elif lst[idx]<lst[idx+1]:
            order.append('I')
        else:
            order.append('E')
    order = set(order)
    if len(order) == 1 & ('I' in order or 'D' in order):
        return order.pop()
    else:
        return 'N'

In [46]:
df['Answer'] = df['Sequences'].apply(lst_order)

In [47]:
df

Unnamed: 0,Sequences,Answer
0,"[1, 4, 5, 8]",I
1,"[9, 7, 6, 1, 0]",D
2,"[8, 9, 3]",N
3,"[5, 5, 5, 5, 5, 5]",N
4,"[-19, -8, 0, 5, 8, 10]",I
5,"[19, 8, 0, 5, 8, 10]",N
6,"[19, 8, 0, -5, -8, -10, -12, -19]",D


In [85]:
df = pd.DataFrame({
    'Sequences': [
        '1, 4, 5, 8',
        '9, 7, 6,  1, 0',
        '8, 9, 3',
        '5, 5, 5, 5, 5, 5',
        '-19, -8, 0, 5, 8, 10',
        '19, 8, 0, 5, 8, 10',
        '19, 8, 0, -5, -8, -10, -12, -19']
})
df

Unnamed: 0,Sequences
0,"1, 4, 5, 8"
1,"9, 7, 6, 1, 0"
2,"8, 9, 3"
3,"5, 5, 5, 5, 5, 5"
4,"-19, -8, 0, 5, 8, 10"
5,"19, 8, 0, 5, 8, 10"
6,"19, 8, 0, -5, -8, -10, -12, -19"


In [86]:
df['Answer'] = df['Sequences'].apply(lambda x: [int(_) for _ in x.split(', ')])
df

Unnamed: 0,Sequences,Answer
0,"1, 4, 5, 8","[1, 4, 5, 8]"
1,"9, 7, 6, 1, 0","[9, 7, 6, 1, 0]"
2,"8, 9, 3","[8, 9, 3]"
3,"5, 5, 5, 5, 5, 5","[5, 5, 5, 5, 5, 5]"
4,"-19, -8, 0, 5, 8, 10","[-19, -8, 0, 5, 8, 10]"
5,"19, 8, 0, 5, 8, 10","[19, 8, 0, 5, 8, 10]"
6,"19, 8, 0, -5, -8, -10, -12, -19","[19, 8, 0, -5, -8, -10, -12, -19]"


In [87]:
import numpy as np

def get_list_element_diff(lst: list) -> str:
    return np.diff(lst)

def chk_sort_order(lst: list) -> str:
    return  'I' if all(np.diff(lst)> 0) else\
            'D' if all(np.diff(lst)< 0) else\
            'N'

In [88]:
df['Answer'] = df['Answer'].apply(chk_sort_order)
df

Unnamed: 0,Sequences,Answer
0,"1, 4, 5, 8",I
1,"9, 7, 6, 1, 0",D
2,"8, 9, 3",N
3,"5, 5, 5, 5, 5, 5",N
4,"-19, -8, 0, 5, 8, 10",I
5,"19, 8, 0, 5, 8, 10",N
6,"19, 8, 0, -5, -8, -10, -12, -19",D


In [90]:
arr_2d = np.array(
    [[1, 2, 4],
    [10, 20, 30]]
)

In [91]:
np.diff(arr_2d)

array([[ 1,  2],
       [10, 10]])

In [92]:
np.diff(arr_2d, axis=1)

array([[ 1,  2],
       [10, 10]])

In [93]:
np.diff(arr_2d, axis=0)

array([[ 9, 18, 26]])

In [96]:
df.notnull()

Unnamed: 0,Sequences,Answer
0,True,True
1,True,True
2,True,True
3,True,True
4,True,True
5,True,True
6,True,True


In [98]:
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

In [100]:
iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [101]:
iris['petal_width'].quantile()

np.float64(1.3)

In [102]:
iris.groupby('species')['petal_width'].quantile()

species
setosa        0.2
versicolor    1.3
virginica     2.0
Name: petal_width, dtype: float64

In [111]:
iris.groupby('species')['petal_width'].quantile([0.25, 0.5, 0.6, 0.67, 0.75, 0.95])

species         
setosa      0.25    0.200
            0.50    0.200
            0.60    0.200
            0.67    0.200
            0.75    0.300
            0.95    0.400
versicolor  0.25    1.200
            0.50    1.300
            0.60    1.400
            0.67    1.400
            0.75    1.500
            0.95    1.600
virginica   0.25    1.800
            0.50    2.000
            0.60    2.100
            0.67    2.183
            0.75    2.300
            0.95    2.455
Name: petal_width, dtype: float64

In [115]:
import pandas as pd
import numpy as np

# Sample DataFrame with missing values
data = {'Name': ['Alice', 'Bob', np.nan, 'David'], 
        'Age': [25, 30, np.nan, 35], 
        'City': ['New York', np.nan, 'London', 'Paris']}
df = pd.DataFrame(data)

# 1. Identifying missing values
print("Missing value counts per column:\n",df.isnull().sum())

# 2. Removing missing values (dropna)
df_dropped = df.dropna()
print("\nDataFrame after dropping rows with any missing value:\n", df_dropped)

# 3. Imputing with mean (for numerical columns)
df_filled_mean = df.fillna(df.mean(numeric_only=True))
print("\nDataFrame after filling missing 'Age' with mean:\n", df_filled_mean)

# 3. Imputing with median (for numerical columns)
df_filled_median = df.fillna(df.median(numeric_only=True))
print("\nDataFrame after filling missing 'Age' with median:\n", df_filled_median)

# 4. Handling outliers (demonstration with 'Age')
# Assuming we identify 40 as an outlier based on domain knowledge or visualization
df['Age_capped'] = df['Age'].clip(upper=40)  # Cap values at 28
print("\nDataFrame with 'Age' capped at 40:\n", df)

# 5. Data type conversion
df['Age'] = pd.to_numeric(df['Age'], errors='coerce')  # Convert to numeric, handling errors
print("\nData types after conversion:\n", df.dtypes)

# 6. Exploratory Data Analysis
print("\nDescriptive statistics:\n", df.describe())

# Group by and aggregate
grouped_data = df.groupby('City')['Age'].mean()
display("\nAverage Age by City:\n", grouped_data)

Missing value counts per column:
 Name    1
Age     1
City    1
dtype: int64

DataFrame after dropping rows with any missing value:
     Name   Age      City
0  Alice  25.0  New York
3  David  35.0     Paris

DataFrame after filling missing 'Age' with mean:
     Name   Age      City
0  Alice  25.0  New York
1    Bob  30.0       NaN
2    NaN  30.0    London
3  David  35.0     Paris

DataFrame after filling missing 'Age' with median:
     Name   Age      City
0  Alice  25.0  New York
1    Bob  30.0       NaN
2    NaN  30.0    London
3  David  35.0     Paris

DataFrame with 'Age' capped at 40:
     Name   Age      City  Age_capped
0  Alice  25.0  New York        25.0
1    Bob  30.0       NaN        30.0
2    NaN   NaN    London         NaN
3  David  35.0     Paris        35.0

Data types after conversion:
 Name           object
Age           float64
City           object
Age_capped    float64
dtype: object

Descriptive statistics:
         Age  Age_capped
count   3.0         3.0
mean   30

'\nAverage Age by City:\n'

City
London       NaN
New York    25.0
Paris       35.0
Name: Age, dtype: float64

In [117]:
import pandas as pd

# Create a Pandas Series
s = pd.Series([5, 15, 25, 35, 45])
print(s)

# Apply clip with lower=10 and upper=30
clipped_s = s.clip(lower=10, upper=30)

print(clipped_s)

0     5
1    15
2    25
3    35
4    45
dtype: int64
0    10
1    15
2    25
3    30
4    30
dtype: int64


In [118]:
pd.Series([5, 15, 25, 35, 45]).clip(lower=10, upper=30)

0    10
1    15
2    25
3    30
4    30
dtype: int64

In [119]:
df.describe()

Unnamed: 0,Age,Age_capped
count,3.0,3.0
mean,30.0,30.0
std,5.0,5.0
min,25.0,25.0
25%,27.5,27.5
50%,30.0,30.0
75%,32.5,32.5
max,35.0,35.0


In [120]:
st = set([1,2, 3, 4])

In [121]:
import pandas as pd

# Create a sample dataset
data = {
    'Date': pd.date_range(start='2024-02-12', periods=5, freq='D'),
    'Stock': ['AAPL', 'MSFT', 'GOOGL', 'TSLA', 'AMZN'],
    'Open': [182.5, 410.2, 141.3, 196.8, 155.4],
    'Close': [185.2, 415.8, 144.1, 200.5, 158.2],
    'Volume': [5000000, 3200000, 2100000, 4800000, 3900000]
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Display dataset
df

Unnamed: 0,Date,Stock,Open,Close,Volume
0,2024-02-12,AAPL,182.5,185.2,5000000
1,2024-02-13,MSFT,410.2,415.8,3200000
2,2024-02-14,GOOGL,141.3,144.1,2100000
3,2024-02-15,TSLA,196.8,200.5,4800000
4,2024-02-16,AMZN,155.4,158.2,3900000


In [130]:
import pandas as pd
import numpy as np

# Create a date range for 15 days
dates = pd.date_range(start="2024-02-01", periods=15, freq='D')

# Generate sample stock prices
np.random.seed(42)  # For reproducibility
prices = np.random.randint(150, 200, size=15)  # Random prices between 150-200

# Create the DataFrame
df = pd.DataFrame({
    'Date': dates,
    'Stock': 'AAPL',  # Example stock symbol
    'Close': prices  # Closing prices
})

# Calculate 7-day rolling average
df['7_day_avg'] = df['Close'].rolling(window=7).mean().round(2)

# Display dataset
df.head()

Unnamed: 0,Date,Stock,Close,7_day_avg
0,2024-02-01,AAPL,188,
1,2024-02-02,AAPL,178,
2,2024-02-03,AAPL,164,
3,2024-02-04,AAPL,192,
4,2024-02-05,AAPL,157,


In [128]:
df['Close'].rolling(window=7)

Rolling [window=7,center=False,axis=0,method=single]

In [131]:
df

Unnamed: 0,Date,Stock,Close,7_day_avg
0,2024-02-01,AAPL,188,
1,2024-02-02,AAPL,178,
2,2024-02-03,AAPL,164,
3,2024-02-04,AAPL,192,
4,2024-02-05,AAPL,157,
5,2024-02-06,AAPL,170,
6,2024-02-07,AAPL,188,176.71
7,2024-02-08,AAPL,168,173.86
8,2024-02-09,AAPL,172,173.0
9,2024-02-10,AAPL,160,172.43


In [125]:
df['7_day_avg']

0            NaN
1            NaN
2            NaN
3            NaN
4            NaN
5            NaN
6     176.714286
7     173.857143
8     173.000000
9     172.428571
10    167.857143
11    170.142857
12    172.285714
13    172.428571
14    173.142857
Name: 7_day_avg, dtype: float64

In [126]:
df['7_day_avg'].fillna(method='bfill')

  df['7_day_avg'].fillna(method='bfill')


0     176.714286
1     176.714286
2     176.714286
3     176.714286
4     176.714286
5     176.714286
6     176.714286
7     173.857143
8     173.000000
9     172.428571
10    167.857143
11    170.142857
12    172.285714
13    172.428571
14    173.142857
Name: 7_day_avg, dtype: float64

In [None]:
import re

def longest_alternating_substring(num):
    num_str = str(num)  # Convert number to string
    max_substr = ""
    current_substr = num_str[0]

    for i in range(1, len(num_str)):
        if (int(num_str[i-1]) % 2 == 0 and int(num_str[i]) % 2 != 0) or \
           (int(num_str[i-1]) % 2 != 0 and int(num_str[i]) % 2 == 0):
            current_substr += num_str[i]
        else:
            max_substr = max(max_substr, current_substr, key=len)
            current_substr = num_str[i]

    max_substr = max(max_substr, current_substr, key=len)
    return max_substr

# Example Usage
num = 73568249123
result = longest_alternating_substring(num)
print("Longest alternating odd/even substring:", result)

: 

In [1]:
print('Hi')

Hi


In [6]:
import re

def longest_alternating_substring(num: int):
    num_str = str(num)  # Convert number to string
    max_substr = ""
    current_substr = num_str[0]

    for i in range(1, len(num_str)):
        if (int(num_str[i-1]) % 2 == 0 and int(num_str[i]) % 2 != 0) or \
           (int(num_str[i-1]) % 2 != 0 and int(num_str[i]) % 2 == 0):
            current_substr += num_str[i]
        else:
            max_substr = max(max_substr, current_substr, key=len)
            current_substr = num_str[i]

    max_substr = max(max_substr, current_substr, key=len)
    return max_substr

# Example Usage
num = 73568249123
result = longest_alternating_substring(num)
print("Longest alternating odd/even substring:", result)


Longest alternating odd/even substring: 123


In [58]:
num = '103822925723477672'
num_str = str(num)

# pattern = r'(?:(?=[13579][02468])+|(?=[02468][13579]))+'  #+(?:[0-9](?=[13579][02468]|[02468][13579]))+'
# pattern = r"(?3)?(([02468])([13579]))+(?2)?"
# pattern = r'(([13579][02468])+)|(([02468][13579])+)'
# pattern = r'(?:(?:[02468][13579])|(?:[13579][02468]))+'
pattern = r'(?:[02468][13579])+|(?:[13579][02468])+'

lst = re.findall(pattern, num_str)
# lst = [int(item) for tup in lst for item in tup if item != '']
lst


['1038', '2925', '7234', '7672']

Longest alternating substring: 


In [72]:
import pandas as pd
df = pd.DataFrame(
    {
        'Numbers': ['6579', '142567', '827810510', '9961256799', '826457676143', '450193498396705', '7772405857091285', '103822925723477672', '1234567890123456789']
    }
)

def extract_longest_alternating_substr(num: int) -> str:
    pattern = r'(([13579][02468])+)|(([02468][13579])+)'
    lst = re.findall(pattern, str(num))
    lst = [int(item) for tup in lst for item in tup if item != '']
    return max(lst)


df

Unnamed: 0,Numbers
0,6579
1,142567
2,827810510
3,9961256799
4,826457676143
5,450193498396705
6,7772405857091285
7,103822925723477672
8,1234567890123456789


In [60]:
df['Numbers'].apply(longest_alternating_substring)

0                     65
1                   2567
2                 278105
3                9612567
4                7676143
5                  34983
6                   0585
7                  72347
8    1234567890123456789
Name: Numbers, dtype: object

In [59]:
import re

def longest_alternating_substring(num: int):
    num_str = str(num)  # Convert number to string
    max_substr = ""
    current_substr = num_str[0]

    for i in range(1, len(num_str)):
        if (int(num_str[i-1]) % 2 == 0 and int(num_str[i]) % 2 != 0) or \
           (int(num_str[i-1]) % 2 != 0 and int(num_str[i]) % 2 == 0):
            current_substr += num_str[i]
        else:
            max_substr = max(max_substr, current_substr, key=len)
            current_substr = num_str[i]

    max_substr = max(max_substr, current_substr, key=len)
    return max_substr


# Example Usage
num = 73568249123
result = longest_alternating_substring(num)
print("Longest alternating odd/even substring:", result)

Longest alternating odd/even substring: 123


In [92]:
"""Here's a comprehensive regex pattern to match alternating even/odd or odd/even substrings:

```regex
(?:[02468][13579])+|(?:[13579][02468])+
```

Breakdown:
1. `[02468]` - matches any even digit
2. `[13579]` - matches any odd digit
3. `(?:[02468][13579])+` - matches even-odd sequences
4. `(?:[13579][02468])+` - matches odd-even sequences
5. `|` - alternation (matches either pattern)

Example Python implementation:

```python"""
import re

def find_longest_alternating(s):
    pattern = r'(?:[02468][13579])+|(?:[13579][02468])+'
    # pattern = r'(?:(?:[02468][13579])|(?:[13579][02468]))+'
    # pattern = r'([02468]{1,}?:[13579]{1,}?:)*'
    matches = re.finditer(pattern, s)
    return max((m.group() for m in matches), key=len)

# # Test cases
# test_strings = [
#     "123456",      # -> "1234" or "3456"
#     "2468",        # -> ""
#     "14725

In [93]:
find_longest_alternating('123456')

'123456'

In [94]:
df['Numbers'].apply(find_longest_alternating)

0                    65
1                  2567
2                278105
3                961256
4                767614
5                  4501
6                  0585
7                  1038
8    123456789012345678
Name: Numbers, dtype: object

In [101]:
df1 = pd.DataFrame( [['N', 'O'], ['O', 'N']])
df1

Unnamed: 0,0,1
0,N,O
1,O,N


In [98]:
df = pd.DataFrame( [['B', 'I', 'T'], ['I', 'C', 'E'],  ['T', 'I', 'N']])
df

Unnamed: 0,0,1,2
0,B,I,T
1,I,C,E
2,T,I,N


In [99]:
df.T

Unnamed: 0,0,1,2
0,B,I,T
1,I,C,I
2,T,E,N


In [117]:
df1==df1.T

Unnamed: 0,0,1
0,True,True
1,True,True


In [125]:
import numpy as np
'Yes' if np.all((df1==df1.T)) else 'No'

'Yes'

In [115]:
np.all((df1==df1.T).values)

np.True_

In [110]:
df1.compare(df1.T).isnull().sum()

Series([], dtype: float64)

In [111]:
df.compare(df.T).isnull().sum()

1  self     1
   other    1
2  self     1
   other    1
dtype: int64

In [23]:
import pandas as pd
import numpy as py

In [40]:
lst = range(10,100000,1)

In [41]:
df = pd.DataFrame({'nums': lst}, columns=['nums'])

In [42]:
df['cubes'] = df['nums'].apply(lambda x: x**3)
df['cube_str'] = df['cubes'].apply(lambda x: [*str(x)])
df['freq'] = df['cube_str'].apply(lambda x: {x.count(str(i)) for i in set(x)})
df['filter'] = df['freq'].apply(lambda x: len(x)==1)

In [44]:
df[df['filter']==True]['nums'].values

array([   11,    12,    13,    16,    17,    18,    19,    21,    22,
          24,    27,    29,    32,    35,    38,    41,    59,    62,
          66,    69,    73,    75,    76,    84,    88,    93,    97,
         135,   145,   203,   289,   297,   302,   303,   319,   888,
        1412,  1694,  2078,  4755,  5399,  6181,  6274,  6443,  9078,
        9413,  9709, 22111, 22819, 23894, 24835, 26636, 26881, 28895,
       29631, 30593, 32069, 32687, 32723, 32887, 34699, 35042, 36536,
       36705, 36869, 37568, 40675, 41538, 41674, 42487, 42582, 44673,
       45166, 45438, 45675, 56592, 58524, 65577, 70869, 78183])

In [None]:
df['freq'].apply(lambda x: len(x)==1)

0              {1, 3}
1                 {2}
2                 {1}
3                 {1}
4              {1, 2}
             ...     
99985    {1, 2, 3, 5}
99986    {1, 2, 3, 6}
99987       {1, 3, 7}
99988    {8, 1, 2, 3}
99989       {9, 4, 1}
Name: freq, Length: 99990, dtype: object