Step 1. Ensure that you have the dataset file named `transactions.csv` in the current directory.

The dataset is a subset of https://www.kaggle.com/ealaxi/paysim1/version/2 which was originally generated as part of the following research:

E. A. Lopez-Rojas , A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016

Step 2. Complete the following exercises.

0. Read the dataset (`transactions.csv`) as a Pandas dataframe. Note that the first row of the CSV contains the column names.

0. Return the column names as a list from the dataframe.

0. Return the first k rows from the dataframe.

0. Return a random sample of k rows from the dataframe.

0. Return a list of the unique transaction types.

0. Return a Pandas series of the top 10 transaction destinations with frequencies.

0. Return all the rows from the dataframe for which fraud was detected.

0. Bonus. Return a dataframe that contains the number of distinct destinations that each source has interacted with to, sorted in descending order. You will find [groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) and [agg](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html) useful. The predefined aggregate functions are under `pandas.core.groupby.GroupBy.*`. See the [left hand column](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.nunique.html).

Use the empty cell to test the exercises. If you modify the original `df`, you can rerun the cell containing `exercise_0`.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

#Read the dataset
def exercise_0(file):
    df=pd.read_csv("transactions.csv")
    return df

def exercise_1(df):
    # Return the column names as a list
    return df.columns.tolist()

def exercise_2(df, k):
    # Return the first k rows
    return df.head(k)

def exercise_3(df, k):
    # Return a random sample of k rows
    return df.sample(n=k)

# Now, modify the exercise_4 function based on the correct column name
def exercise_4(df):
    # Check if 'transaction_type' exists, or use the actual column name
    if 'transaction_type' in df.columns:
        return df['transaction_type'].unique().tolist()
    else:
        return f"'transaction_type' column not found! Please check the column names."

def exercise_5(df):
    # Return a Pandas series of the top 10 transaction destinations with frequencies
    return df['destination'].value_counts().head(10)

def exercise_6(df):
    # Return all the rows from the dataframe for which fraud was detected
    return df[df['fraud_detected'] == True]

def exercise_7(df):
    # Bonus: Return a dataframe that contains the number of distinct destinations each source has interacted with, sorted in descending order
    return df.groupby('source')['destination'].nunique().sort_values(ascending=False)



def visual_1(df):
    # Example visualization: Plot the distribution of transaction types
    df['transaction_type'].value_counts().plot(kind='bar')
    plt.title('Distribution of Transaction Types')
    plt.xlabel('Transaction Type')
    plt.ylabel('Frequency')
    plt.show()

def visual_2(df):
    # Example visualization: Plot the top 10 transaction destinations
    df['destination'].value_counts().head(10).plot(kind='bar')
    plt.title('Top 10 Transaction Destinations')
    plt.xlabel('Destination')
    plt.ylabel('Frequency')
    plt.show()


def exercise_custom(df):
    # Custom exercise: Example - Count transactions per hour
    df['hour'] = pd.to_datetime(df['timestamp']).dt.hour
    return df['hour'].value_counts().sort_index()

def visual_custom(df):
    # Custom visualization: Example - Plot transaction counts by hour
    df['hour'] = pd.to_datetime(df['timestamp']).dt.hour
    df['hour'].value_counts().sort_index().plot(kind='line')
    plt.title('Transactions by Hour of the Day')
    plt.xlabel('Hour')
    plt.ylabel('Number of Transactions')
    plt.show()

In [2]:
# Example usage:
df = exercise_0('transactions.csv')


In [3]:
# Test exercise_1: Return column names
print("Column names:")
print(exercise_1(df))

# Test exercise_2: Return the first k rows (example with k=5)
k = 5
print(f"\nFirst {k} rows:")
print(exercise_2(df, k))

# Test exercise_3: Return a random sample of k rows (example with k=5)
print(f"\nRandom sample of {k} rows:")
print(exercise_3(df, k))

# Test exercise_4: Return a list of the unique transaction types
print("\nUnique transaction types:")
print(exercise_4(df))

# Test exercise_5: Return the top 10 transaction destinations with frequencies
print("\nTop 10 transaction destinations with frequencies:")
print(exercise_5(df))

# Test exercise_6: Return all the rows where fraud was detected
print("\nRows where fraud was detected:")
print(exercise_6(df))

# Test exercise_7: Bonus - Return the number of distinct destinations each source has interacted with
print("\nDistinct destinations per source, sorted:")
print(exercise_7(df))


Column names:
['step', 'type', 'amount', 'nameOrig', 'oldbalanceOrg', 'newbalanceOrig', 'nameDest', 'oldbalanceDest', 'newbalanceDest', 'isFraud', 'isFlaggedFraud']

First 5 rows:
   step      type    amount     nameOrig  oldbalanceOrg  newbalanceOrig  \
0     1   PAYMENT   9839.64  C1231006815       170136.0       160296.36   
1     1   PAYMENT   1864.28  C1666544295        21249.0        19384.72   
2     1  TRANSFER    181.00  C1305486145          181.0            0.00   
3     1  CASH_OUT    181.00   C840083671          181.0            0.00   
4     1   PAYMENT  11668.14  C2048537720        41554.0        29885.86   

      nameDest  oldbalanceDest  newbalanceDest  isFraud  isFlaggedFraud  
0  M1979787155             0.0             0.0        0               0  
1  M2044282225             0.0             0.0        0               0  
2   C553264065             0.0             0.0        1               0  
3    C38997010         21182.0             0.0        1               0  

KeyError: 'destination'

Create graphs for the following. 
1. Transaction types bar chart, Transaction types split by fraud bar chart
1. Origin account balance delta v. Destination account balance delta scatter plot for Cash Out transactions

Ensure that the graphs have the following:
 - Title
 - Labeled Axes
 
The function plot the graph and then return a string containing a short description explaining the relevance of the chart.

In [None]:
def visual_1(df):
    def transaction_counts(df):
        # TODO
        pass
    def transaction_counts_split_by_fraud(df):
        # TODO
        pass

    fig, axs = plt.subplots(2, figsize=(6,10))
    transaction_counts(df).plot(ax=axs[0], kind='bar')
    axs[0].set_title('TODO')
    axs[0].set_xlabel('TODO')
    axs[0].set_ylabel('TODO')
    transaction_counts_split_by_fraud(df).plot(ax=axs[1], kind='bar')
    axs[1].set_title('TODO')
    axs[1].set_xlabel('TODO')
    axs[1].set_ylabel('TODO')
    fig.suptitle('TODO')
    fig.tight_layout(rect=[0, 0.03, 1, 0.95])
    for ax in axs:
      for p in ax.patches:
          ax.annotate(p.get_height(), (p.get_x(), p.get_height()))
    return 'TODO'

visual_1(df)


In [None]:
def visual_2(df):
    def query(df):
        # TODO
        pass
    plot = query(df).plot.scatter(x='TODO',y='TODO')
    plot.set_title('TODO')
    plot.set_xlim(left=-1e3, right=1e3)
    plot.set_ylim(bottom=-1e3, top=1e3)
    return 'TODO'

visual_2(df)


Use your newly-gained Pandas skills to find an insight from the dataset. You have full flexibility to go in whichever direction interests you. Please create a visual as above for this query. `visual_custom` should call `exercise_custom`.

In [None]:
def exercise_custom(df):
    # TODO
    pass
    
def visual_custom(df):
    # TODO
    pass

Submission

1. Copy the exercises into `task1.py`.
2. Upload `task1.py` to Forage.

All done!

Your work will be instrumental for our team's continued success.