🌎 I created this repository for educational purposes. It will host a number of projects as part of the process .
Branch: master
Clone or download
Latest commit 989704e Jan 27, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/ISSUE_TEMPLATE
MachineLearning fixed #13 Apr 3, 2018
PythonFiles add licence Aug 15, 2018
R Delete aNN.R (#36) Dec 21, 2018
SimCardSwap Update SimSwap.py Oct 17, 2018
TestOne.py fixed #8 Mar 25, 2018
.gitignore fixed #4 Mar 22, 2018
CODE_OF_CONDUCT.md Update CODE_OF_CONDUCT.md (#31) Aug 17, 2018
CONTRIBUTING.md Update CONTRIBUTING.md Mar 23, 2018
ISSUE_TEMPLATE.md
LICENSE.md Create LICENSE.md (#28) Aug 13, 2018
PULL_REQUEST_TEMPLATE.md Create PULL_REQUEST_TEMPLATE.md Mar 23, 2018
README.md

README.md

Machine Learning

NOTE: I created this repository for educational purposes. It will host a number of projects as part of the process and some exercises that we created, just purely a learning process. Not perfectly done.

Intro

Project A

With an upsurge in cybercrimes related to Sim Card Swap fraud in developing countries, making fraud detection is a top priority. If we are able to estimate whether someone is going to commit Sim Card Fraud we can surely try to prevent it earlier.

Intro

Predicting the likelihood of Sim Card Swap Fraud Occurrence.

  • Train and test the data samples
  • Normalize and summarize the data

Mode

Develop

Implementations

  • Define Problem
  • Prepare Data
  • Evaluate Algorithms
  • Improve Results
  • Present Results

Usage

Sim Card Swap Fraud Detection.

Model Used

  • Logistic Regression. Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis.

Data

Sample Dataset

There can be many factors as to why someone would want to swap his/her sim card, I will just use few. The swap will be represented by 1 and 0 will represent not swapped. I created this data for this exercise.

Sample Output Representation:

Swap Not Swapped
1 0
  • Sample Fake Data taken from Nairobi Data is not given in this case so I decided to create my own, I will identify Locations here though I will not use Location since we can have many customers living in the same Location.
ID Location Age Subscriber Complaints Monthly Payments KSH Contacts Swap Agent
1 N/A 30 3 1200 20 0
2 N/A 18 2 60 10 1
3 N/A 60 1 180 44 0
4 N/A 25 2 200 30 0
5 N/A 30 2 300 10 1
6 N/A 45 1 900 55 0
7 N/A 50 3 120 20 0
8 N/A 78 1 60 10 1
9 N/A 26 1 180 44 0
10 N/A 23 2 200 30 0
11 N/A 33 2 300 10 1
12 N/A 45 1 1200 55 0
13 N/A 30 2 800 100 0
14 N/A 33 6 60 90 1
15 N/A 26 1 180 44 0
16 N/A 23 2 2000 30 0
17 N/A 33 2 3000 10 1
18 N/A 45 1 1200 55 0
19 N/A 66 1 50 100 0
20 N/A 78 1 60 10 1
21 N/A 26 1 180 44 0
22 N/A 23 2 2000 30 0
23 N/A 33 2 300 10 1
24 N/A 45 1 1200 55 0
25 N/A 66 1 50 100 0
26 N/A 78 1 60 10 1
27 N/A 26 1 180 44 0
28 N/A 23 2 200 30 0
29 N/A 33 2 3000 10 1
30 N/A 45 1 1200 55 0

Preview of Data

data.describe()

describe

Data Visualization

  • Graphing the features in a pair plot

swap_fraud

Results

0.625 Not very bad since the data is Random.

ROC

accuracy

Contributing

Read Contributing

Machine learning algorithms:

Linear Algorithms:

  • Algorithm 1: Linear Regression
  • Algorithm 2: Logistic Regression
  • Algorithm 3: Linear Discriminant Analysis

*Nonlinear Algorithms:

  • Algorithm 4: Classification and Regression Trees
  • Algorithm 5: Naive Bayes
  • Algorithm 6: K-Nearest Neighbors
  • Algorithm 7: Learning Vector Quantization
  • Algorithm 8: Support Vector Machines

*Ensemble Algorithms:

  • Algorithm 9: Bagged Decision Trees and Random Forest
  • Algorithm 10: Boosting and AdaBoost

License

Copyright [2018] [Madonah Syombua]

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.