# Revolut Financial Crime Challenge
## Home Task

# TASK 1 - Communication and SQL familiarity

#### Examine the following SQL query, and explain clearly and succinctly what it means. Will the query work? Explain why or why not. (15 points)

```SQL
WITH processed_users AS (
SELECT left(u.phone_country, 2) AS short_phone_country, u.id 
FROM users u
)
SELECT t.user_id, 
t.merchant_country, 
sum(t.amount / fx.rate / power(10, cd.exponent)) AS amount 
FROM transactions t
JOIN fx_rates fx ON (fx.ccy = t.currency AND fx.base_ccy = 'EUR')
JOIN currency_details cd ON cd.currency = t.currency
JOIN processed_users pu ON pu.id = t.user_id
WHERE t.source = 'GAIA'
AND pu.short_phone_country = t.merchant_country
GROUP BY t.user_id, t.merchant_country

ORDER BY amount DESC;```

![Screenshot%202019-03-13%20at%2000.07.06.png](attachment:Screenshot%202019-03-13%20at%2000.07.06.png)

**Examine the following SQL query, and explain clearly and succinctly what it means:**

**Will the query work? Explain why or why not.**
___

The code above is not working due to this line -> **AND pu.short_phone_country = t.merchant_country**. Compared values are in different formats and that is why result is empty.

> pu.short_phone_country  -> **varchar(2)**, ex. HU

> t.merchant_country -> **varchar(3)**, ex. HUN


The solution for this mistake will be aligning Merchant country code to Phone country code by modifying the string:

> Instead of **AND pu.short_phone_country = t.merchant_country** should be **AND pu.short_phone_country = left(t."MERCHANT_COUNTRY",2)**

***
Additionally, calculation for exchange rate is wrong as well:

>Incorrect code - **sum(t."AMOUNT" / fx.rate / power(10, cd.exponent)) AS amount**

>Correct code - **sum(t."AMOUNT" * fx.rate / power(10, cd.exponent)) AS amount** 



![Screenshot%202019-03-13%20at%2000.09.05.png](attachment:Screenshot%202019-03-13%20at%2000.09.05.png)

```SQL
WITH processed_users AS (
SELECT left(u."PHONE_COUNTRY", 2) AS 
short_phone_country, u."ID"
FROM users u)
SELECT t."USER_ID",
t."MERCHANT_COUNTRY",
sum(t."AMOUNT" * fx."rate" / power(10, cd.exponent)) AS amount
FROM transactions t
JOIN fx_rates fx ON (fx.ccy = t."CURRENCY" AND fx.base_ccy = 'EUR')
JOIN currency_details cd ON cd.ccy = t."CURRENCY"
JOIN processed_users pu ON pu."ID" = t."USER_ID"
WHERE t."SOURCE" = 'GAIA'
AND pu.short_phone_country = left(t."MERCHANT_COUNTRY",2)
GROUP BY t."USER_ID", t."MERCHANT_COUNTRY"
ORDER BY amount DESC; ```

# TASK 2 - Communication and SQL familiarity

#### Now it’s your turn! Write a query to identify users whose first transaction was a successful card payment over $10 USD equivalent (10 points)

### Correct SQL Query:
___

```SQL
SELECT DISTINCT ON (tr."USER_ID")
	tr."USER_ID", tr."CURRENCY", tr."AMOUNT", 
	CASE WHEN fx.ccy = tr."CURRENCY" THEN tr."AMOUNT"*fx.rate / power(10, cd.exponent) END AS "AMOUNT_IN_USD",
	tr."CREATED_DATE" as "Date_of_First_Transaction"
FROM Public.fx_rates AS fx
INNER JOIN transactions as tr ON tr."CURRENCY" = fx.ccy
JOIN currency_details cd ON cd.ccy = tr."CURRENCY"
WHERE base_ccy = 'USD' 
    AND tr."TYPE" = 'CARD_PAYMENT' 
    AND tr."STATE" = 'COMPLETED' 
    AND (CASE WHEN fx.ccy = tr."CURRENCY" THEN tr."AMOUNT"*fx.rate / power(10, cd.exponent) END) >10
ORDER BY tr."USER_ID", tr."CREATED_DATE" ASC;
```

### Other solution using Python and pandas library
___

In [None]:
#importing pandas library
import pandas as pd

In [None]:
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:3150@localhost:5432/Revolut_Home_Task')
df = pd.read_sql_table("fraudsters",engine)
df = pd.read_csv('./fraudsters.csv',index_col=0)
df.to_sql(name="fraudsters",
          con=engine,
          index=False,
          if_exists="replace"
         )

In [25]:
#loading all csv files using pandas
currency_details = pd.read_csv('./currency_details.csv')
fx_rates = pd.read_csv('./fx_rates.csv')
transactions = pd.read_csv('./transactions.csv',index_col=0)

In [23]:
#Merging fx_rates and currency_details tables
fx_rates_exponent = pd.merge(fx_rates, currency_details, how='inner', left_on="ccy", right_on='currency')

In [22]:
#taking ex_rate for USD vs other currencies and dropping out unused columns
rates_in_usd = fx_rates_exponent[fx_rates_exponent['base_ccy']=='USD'].drop(['currency','iso_code','is_crypto','base_ccy'],axis=1)

In [19]:
#Merging transactions and rates_in_usd tables
merged_trans = pd.merge(transactions, rates_in_usd, how='inner', left_on='CURRENCY', right_on='ccy')

In [20]:
#Creating new column "Amount in USD" and applying function Amount * ex_rate / 10**exponent
merged_trans['Amount_in_USD'] = merged_trans['AMOUNT']*merged_trans['rate']/10**merged_trans['exponent']

In [21]:
#Sorting data by status Completed and by Card Payment
merged_trans = merged_trans[(merged_trans['STATE'] =="COMPLETED") & (merged_trans['TYPE'] == 'CARD_PAYMENT')]

In [43]:
merged_trans = merged_trans.sort_values(by = ['USER_ID','CREATED_DATE'],ascending=True ).drop_duplicates(subset = 'USER_ID', keep='first')
users_with_10USD_trans = merged_trans[merged_trans['Amount_in_USD']>10]
users_with_10USD_trans.USER_ID

KeyError: 'USER_ID'

## To save results into csv file use comand below

In [32]:
users_with_10USD_trans['USER_ID'].to_csv('./users_with_10USD_as_first_transaction.csv',index=False, header='USER_ID')

# TASK 3 - Fraudster Radar

#### Find 5 likely fraudsters (not already found in fraudsters.csv!), provide their user_ids, and explain how you found them and why they are likely fraudsters. Use diagrams, illustrations, etc. Show your work! (25 points)
_(Note: show your work! We are looking for data-driven techniques. If you use Excel, provide the working file. If you use Python, send us a Jupyter notebook, etc.)_

In [None]:
#importing pandas library
import pandas as pd

In [53]:
#loading all csv files using pandas
currency_details = pd.read_csv('./currency_details.csv')
fx_rates = pd.read_csv('./fx_rates.csv')
transactions = pd.read_csv('./transactions.csv',index_col=0)
users = pd.read_csv('./users.csv',index_col=0)
fraudsters = pd.read_csv('./fraudsters.csv',index_col=0)
countries = pd.read_csv('./countries.csv',index_col=0)

In [54]:
#Adding to users table information about known fraudsters
users["Fraudster"] = users['ID'].isin( fraudsters['user_id'])

In [57]:
#Merging transactions and rates_in_usd tables
fraudsters_details = users[users["Fraudster"]==True]

In [62]:
fraudsters_trans = pd.merge(transactions, fraudsters_details, how='inner',left_on="USER_ID", right_on='ID') 

In [67]:
fraudsters_trans.sort_values(by = ['USER_ID','CREATED_DATE_x'],ascending=True )

Unnamed: 0,CURRENCY,AMOUNT,STATE_x,CREATED_DATE_x,MERCHANT_CATEGORY,MERCHANT_COUNTRY,ENTRY_METHOD,USER_ID,TYPE,SOURCE,...,KYC,BIRTH_YEAR,COUNTRY,STATE_y,CREATED_DATE_y,TERMS_VERSION,PHONE_COUNTRY,HAS_EMAIL,ID_y,Fraudster
4520,GBP,1000,COMPLETED,2017-09-10 19:12:34.756000,,,misc,0180632d-7737-42af-aaf0-95c2714d7854,P2P,INTERNAL,...,PASSED,1999,GB,LOCKED,2017-08-18 13:03:12.895000,,GB||JE||IM||GG,1,0180632d-7737-42af-aaf0-95c2714d7854,True
4521,GBP,87946,COMPLETED,2017-09-11 20:40:49.880000,,,misc,0180632d-7737-42af-aaf0-95c2714d7854,TOPUP,MINOS,...,PASSED,1999,GB,LOCKED,2017-08-18 13:03:12.895000,,GB||JE||IM||GG,1,0180632d-7737-42af-aaf0-95c2714d7854,True
4525,GBP,88277,COMPLETED,2017-09-12 04:40:01.775000,,,misc,0180632d-7737-42af-aaf0-95c2714d7854,TOPUP,MINOS,...,PASSED,1999,GB,LOCKED,2017-08-18 13:03:12.895000,,GB||JE||IM||GG,1,0180632d-7737-42af-aaf0-95c2714d7854,True
4524,GBP,88348,COMPLETED,2017-09-12 04:40:16.869000,,,misc,0180632d-7737-42af-aaf0-95c2714d7854,TOPUP,MINOS,...,PASSED,1999,GB,LOCKED,2017-08-18 13:03:12.895000,,GB||JE||IM||GG,1,0180632d-7737-42af-aaf0-95c2714d7854,True
4523,GBP,88165,COMPLETED,2017-09-12 04:40:17.971000,,,misc,0180632d-7737-42af-aaf0-95c2714d7854,TOPUP,MINOS,...,PASSED,1999,GB,LOCKED,2017-08-18 13:03:12.895000,,GB||JE||IM||GG,1,0180632d-7737-42af-aaf0-95c2714d7854,True
4522,GBP,88053,COMPLETED,2017-09-12 04:40:21.450000,,,misc,0180632d-7737-42af-aaf0-95c2714d7854,TOPUP,MINOS,...,PASSED,1999,GB,LOCKED,2017-08-18 13:03:12.895000,,GB||JE||IM||GG,1,0180632d-7737-42af-aaf0-95c2714d7854,True
4526,GBP,440700,COMPLETED,2017-09-12 10:01:41.158000,,,misc,0180632d-7737-42af-aaf0-95c2714d7854,BANK_TRANSFER,MINOS,...,PASSED,1999,GB,LOCKED,2017-08-18 13:03:12.895000,,GB||JE||IM||GG,1,0180632d-7737-42af-aaf0-95c2714d7854,True
4527,GBP,70,COMPLETED,2017-09-12 15:55:21.742000,supermarket,GBR,chip,0180632d-7737-42af-aaf0-95c2714d7854,CARD_PAYMENT,GAIA,...,PASSED,1999,GB,LOCKED,2017-08-18 13:03:12.895000,,GB||JE||IM||GG,1,0180632d-7737-42af-aaf0-95c2714d7854,True
10212,GBP,100,REVERTED,2018-03-04 17:14:41.514000,,,misc,018f9e74-458b-4782-8a92-5f95734ddcd7,TOPUP,HERA,...,FAILED,1989,GB,LOCKED,2018-03-04 17:12:44.325000,2018-05-25,GB||JE||IM||GG,1,018f9e74-458b-4782-8a92-5f95734ddcd7,True
10211,GBP,1400,COMPLETED,2018-03-04 17:14:58.862000,,,misc,018f9e74-458b-4782-8a92-5f95734ddcd7,TOPUP,HERA,...,FAILED,1989,GB,LOCKED,2018-03-04 17:12:44.325000,2018-05-25,GB||JE||IM||GG,1,018f9e74-458b-4782-8a92-5f95734ddcd7,True
