# Credit Card Fraud Detection Project Notebook

## By Eng. Ramy Gendy

## Introduction

> It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. I will use various predictive models to see how accurate they are in detecting whether a transaction is a normal payment or a fraud.

<a id='Investigation Overview'></a>
## Investigation Overview

> In this project, I will conduct analysis on `Credit Card Dataset`. I will pick some of the variables that are related to Target to analyze. I will first do some data wrangling, and then move on to EDA using different types of charts to explore relationships between/among variables, and create and answer our questions.

 **Questions:**

<a href="#01">01. Which Job has the most demand in the market?</a>

<a href="#02">02. Top 5 Sectors with the highest job posts?</a>

<a href="#03">03. Top Other Jobs titles share the same roles in data science with most demand?</a>

<a href="#04">04. Top 5 Industries with the highest job posts?</a>

<a href="#05">05. What are the top needed skills for each job title?</a>

<a href="#06">06. Which Job Title get paid Most?</a>

<a href="#0708">07. Which state pays the highest average salary?</a>

<a href="#0708">08. Minimum and Maximum Salaries in different states?</a>

<a href="#09">09. Top 10 States with the Most jobs?</a>

<a href="#10">10. what is the Average Salary of each posted job title in each state?</a>

<a href="#11">11. Is there is a correlation between the Average salary and the company's age, number of competitors and Ratings?</a>

<a href="#12">12. Is there a relation between Average Salary, Company Age and their Compatitors with jobs?</a>

<a href="#13">13. What is the relation between Average Salary and Company Size with different Job Titles?</a>


## Table of Contents:
 * <a href="#Intro">Introduction.</a>
 * <a href="#Investigation Overview">Investigation Overview.</a>
 * <a href="#Dataset Overview & Understanding">Dataset Overview & Understanding.</a>
 * <a href="#Data Preprocessing">Data Preprocessing:</a>
     * Apply Feature Engineering and Extraction:
       - Domain knowledge features.
       - Apply string operations.
       - Work with Text.
     * Apply Feature Transformations: 
       - Data Cleaning.
       - Work with Missing data.
       - Work with Categorical data.
 * <a href="#Exploratory Data Analysis">Exploratory Data Analysis</a>
 * <a href="#Conclusion">Conclusion</a>
  * <a href="#References">References</a>

In [None]:
# Importing libraries
# numpy library use to do array operations and also to do calculations
import numpy as np
# pandas library use to load dataset and also manipulate tabular data
import pandas as pd
# matplot library use to plot different graphs
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.core.pylabtools import figsize
from matplotlib import rcParams
rcParams['figure.figsize'] = 12,5
# seaborn library use to plot different plots
import seaborn as sns

In [None]:
# Ignore matched warnings and never print them
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Set display format for float numbers to the neareast 2 decimal points
pd.options.display.float_format = "{:,.2f}".format
# settings the display
pd.set_option("display.max_columns", None)
#pd.set_option('display.max_colwidth', None)
#pd.set_option('display.max_rows', None)

<a id='Dataset Overview & Understanding'></a>
## Dataset Overview & Understanding

> The dataset of _Glassdoor Jobs_ contains 956 rows and 15 columns. Most variables are categorical, some are numerical. columns like Salary Estimate and Revenue can be presented in numerical values instead of categorical where we can do our statistical analysis better. Some of the variables have a lot of negative values, such as, which I'll need to deal with. Some variables are not of much use, such as company profile: Size, Founded, Type of ownership, Industry, Sector, Revenue, Competitors, which I did not include in my analysis.

### Exploratory Data Analysis

By reading and exploring data reading data, displaying it using head() or tails(), explore data using describe(), info(), unique() and value_counts()

In [None]:
# Read dataset
df = pd.read_csv('glassdoor_jobs.csv')

In [None]:
# view dataset