## SAGEMAKER

![sagemaker](./Sagemaker.jpg)

#### starting instance
https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html

#### Git repo
https://github.com/aws/amazon-sagemaker-examples


#### Sagemaker algorithm for training and performing feature engineering 

1. EDA and Feature engineering using Data Wrangler

https://aws.amazon.com/blogs/machine-learning/exploratory-data-analysis-feature-engineering-and-operationalizing-your-data-flow-into-your-ml-pipeline-with-amazon-sagemaker-data-wrangler/ \


2. Apache MXNET framework

https://aws.amazon.com/mxnet/

https://mxnet.incubator.apache.org/versions/1.9.0/



## Use Cases



## 1. Extract TextRact and Multi label Text classification 
### Problem Statement 

#### Extract text from pdf data using S3, Sagemaker and Textract

#### Task

1) Split large pdf stored in S3 into pages and store selected pages again to s3
2) Use textract to extract text from pdf


#### Approach
1) Connect to s3 using python 
https://www.gormanalysis.com/blog/connecting-to-aws-s3-with-python/

2) Create bucket and push pdf into bucket

https://medium.com/@vishal.sharma./create-an-aws-s3-bucket-using-aws-cli-5a19bc1fda79

3) Configure and setup Sagemaker notebook instance using Boto3 


```python
!pip install boto3
import boto3


s3 = boto3.resource(
    service_name='s3',
    region_name='us-east-2',
    aws_access_key_id='mykey',
    aws_secret_access_key='mysecretkey'
)

or 

s3_client=boto3.client()
response=s3_client.get_object(Bucket='bucketname',Key='filename')

# S3 bucket identifier
bucket = s3.Bucket(name="my_bucket")

```


4) Load or Download pdf from s3 to Sagemaker notebook \
https://towardsdatascience.com/how-to-read-data-files-on-s3-from-amazon-sagemaker-f288850bfe8f


5) Split pdf into pages and stored into s3 back again \
```python
from pdfrw import PdfReader, PdfWriter
```
```python
pages = PdfReader('inputfile.pdf').pages
parts = [(3,6),(7,10)]
for part in parts:
    outdata = PdfWriter(f'pages_{part[0]}_{part[1]}.pdf')
    for pagenum in range(*part):
        outdata.addpage(pages[pagenum-1])
    outdata.write()
```

#### Checks and Testing for performed task

1) Testing Reading pdf from s3 \


```python
!pip install pdfminer3 
!pip install PyPDF2
from pdfminer3.layout import LAParams, LTTextBox
from pdfminer3.pdfpage import PDFPage
from pdfminer3.pdfinterp import PDFResourceManager
from pdfminer3.pdfinterp import PDFPageInterpreter
from pdfminer3.converter import PDFPageAggregator
from pdfminer3.converter import TextConverter
from pdfminer.high_level import extract_pages
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument

from urllib.parse import urlparse
from PyPDF2 import PdfFileWriter, PdfFileReader

import boto3
from io import BytesIO

resource_manager = PDFResourceManager()
file_handle = io.StringIO()
converter = TextConverter(resource_manager, file_handle, laparams=LAParams())
page_interpreter = PDFPageInterpreter(resource_manager, converter)

s3_client=boto3.client()
response=s3_client.get_object(Bucket='bucketname',Key='filename')
data = response['Body'].read()

for page in PDFPage.get_pages(io.BytesIO(data)):
    print(page)
    processed_page=page_interpreter.process_page(io.BytesIO(data))
    text=file_handle.getvalue()
    print(text)
    pdf=PDFPage.create_pages(text)
    
    output=PdffileWriter()
    print(output)
    
    file='test.pdf'
    s3.Bucket('Bucket_name').put_object(Key=file,Body=text)
    break
    
```

2) Reading pdf as to perform specific task
```python
#https://stackoverflow.com/questions/62799852/read-pdf-object-from-s3
import boto3
from PyPDF2 import PdfFileReader
from io import BytesIO

bucket_name ="pdf-forms-bucket"
item_name = "form.pdf"


s3 = boto3.resource('s3')
obj = s3.Object(bucket_name, item_name)
fs = obj.get()['Body'].read()
pdf = PdfFileReader(BytesIO(fs))

data = pdf.getFormTextFields()
```

2) Testing various splitting pdf methods locally 

```python
from PyPDF2 import PdfFileWriter, PdfFileReader

inputpdf = PdfFileReader(open("/path to pdf file directory/pdf_name.pdf", "rb"))

for i in range(inputpdf.numPages):
    output = PdfFileWriter()
    output.addPage(inputpdf.getPage(i))
    with open("document-page%s.pdf" % i, "wb") as outputStream:
        output.write(outputStream)
```

#### Issues 
https://stackoverflow.com/questions/65844539/how-to-use-pdfminer-to-extract-text-from-pdf-files-stored-in-s3-bucket-without-delattr \
https://stackoverflow.com/questions/66206423/how-to-upload-a-pdfusing-pdfpages-to-aws-s3-in-python \
https://stackabuse.com/example-upload-a-file-to-aws-s3-with-boto/ \
https://stackoverflow.com/questions/490195/split-a-multi-page-pdf-file-into-multiple-pdf-files-with-python \
https://programtalk.com/python-examples/pdfminer.pdfpage.PDFPage.create_pages/ \
https://www.analyticsvidhya.com/blog/2021/09/pypdf2-library-for-working-with-pdf-files-in-python/ \
https://realpython.com/pdf-python/  \
https://stackabuse.com/example-upload-a-file-to-aws-s3-with-boto/ \
https://www.blog.pythonlibrary.org/2018/04/11/splitting-and-merging-pdfs-with-python/
https://stackoverflow.com/questions/62799852/read-pdf-object-from-s3


## 2. Image classification 
#### Problem statement

Casting product image data for quality inspection

####  Tasks 
1. Download Data from Kaggle through s3
2. Stored data in s3 
3. Use Image libraries to visualize sample images


#### Reference
https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.html#Realtime-inference

https://blog.jovian.ai/metal-casting-product-image-classification-for-quality-inspection-using-pytorch-72c696d205f3

https://aws.amazon.com/blogs/machine-learning/detect-manufacturing-defects-in-real-time-using-amazon-lookout-for-vision/

https://www.kaggle.com/ravirajsinh45/real-life-industrial-dataset-of-casting-product
https://www.kaggle.com/souvikg544/souvik-ghosh-casting


https://sagemakerexamples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining-highlevel.html



## 3. Predictive Maintainance using Sagemaker and Lambda 

Predictive Maintenance aims to optimize the balance between corrective and preventive maintenance by enabling just in time replacement of components. This approach minimizes the cost of unscheduled maintenance and maximizes the component's lifespan, thus getting more value out of a part

The output you will get is an approximation of time before a failure is expected to happen,

Machine learning-based predictive maintenance is mainly created by using either of the two techniques mentioned below from [here](https://docs.aws.amazon.com/solutions/latest/predictive-maintenance-using-machine-learning/predictive-maintenance-using-machine-learning.pdf#welcome). 

1. Classification approach: This predictive approach makes predictions of the possibility of failure in any of the upcoming steps.  

2. Regression approach: This approach makes prediction of the time left before a system is failed. This predicted time before system failure is also called as Remaining Useful Life (RUL).

To get more clear understanding is been available [here](https://towardsdatascience.com/how-to-implement-machine-learning-for-predictive-maintenance-4633cdbe4860) 

#### Predictive maintenance using ML overview
https://analyticsindiamag.com/machine-learning-for-predictive-maintenance-key-approaches-techniques-to-consider/
https://tryolabs.com/blog/2020/09/03/predictive-maintenance-using-machine-learning

#### Topics 
https://github.com/topics/predictive-maintenance


### Real applications of predictive maintenance with machine learning
1. Oil pumps maintenance
2. Satellites' terminals monitoring (and maintenance)
3. Thermal imagery + machine learning in electrical substations
4. Predictive maintenance can prevent car breakdowns, aviation accidents, and failure of mission-critical systems exposing people and resources to risks.   

Advantages
1. Reduced downtime and even reduced maintenance costs.

Challenges
1. Hard to access places or remote environments are not obstacles for predictive maintenance applicability.
2. It is more important to be able to read, process, and store valuable data than to have the latest and shiniest machine learning models working on worthless data.


### AWS Implementation
Predictive Maintenance Using Machine Learning deploys a machine learning (ML) model and an example dataset of turbofan degradation simulation data to train the model to recognize potential equipment failures.  

Predictive Maintenance Using Machine Learning enables you to execute automated data processing on an example dataset or your own dataset. The included ML model detects potential equipment failures and provides recommended actions to take. The diagram below presents the architecture you can automatically deploy using the solution’s implementation guide and accompanying AWS CloudFormation template.


 
### AWS Architechtures

<!-- ![Predictive maintainance](predictive_maintainance.jpg) -->
### Predictive maintainance architecture1

![Predictive maintainance architecture1](./predictive_maintainence/images/predictive_maintenance_using_machine-learning.png)

[Workflow](https://docs.aws.amazon.com/solutions/latest/predictive-maintenance-using-machine-learning/predictive-maintenance-using-machine-learning.pdf#welcome) details given here [git repo](https://github.com/awslabs/predictive-maintenance-using-machine-learning) and Notebook for worflow given [here](https://github.com/awslabs/predictive-maintenance-using-machine-learning/blob/master/source/notebooks/sagemaker_predictive_maintenance.ipynb)

### Predictive maintainance architecture2

![Predictive maintainance architecture2](./predictive_maintainence/images/Predictive_maintainence_architecture1.jpg)


### Data collection 
https://data.nasa.gov/Aerospace/Turbofan-engine-degradation-simulation-data-set/vrks-gjie
https://www.kaggle.com/c/predictive-maintenance/data
https://github.com/awslabs/predictive-maintenance-using-machine-learning



## 4. Amazon SageMaker and 🤗 Transformers: Train and Deploy a Summarization Model with a Custom Dataset

#### Reference 
https://towardsdatascience.com/amazon-sagemaker-and-transformers-train-and-deploy-a-summarization-model-with-a-custom-dataset-5efc589fedad

https://aws.amazon.com/blogs/machine-learning/distributed-fine-tuning-of-a-bert-large-model-for-a-question-answering-task-using-hugging-face-transformers-on-amazon-sagemaker/?nc1=b_rp

https://aws.amazon.com/blogs/machine-learning/detect-nlp-data-drift-using-custom-amazon-sagemaker-model-monitor/?nc1=b_rp

## 5. A gene-editing prediction engine with iterative learning cycles built on AWS
https://aws.amazon.com/blogs/storage/a-gene-editing-prediction-engine-with-iterative-learning-cycles-built-on-aws/?nc1=b_rp




### AWS
https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/automatically-extract-content-from-pdf-files-using-amazon-textract.html

https://docs.aws.amazon.com/AmazonS3/latest/userguide/IndexDocumentSupport.html

https://www.gormanalysis.com/blog/connecting-to-aws-s3-with-python/
https://realpython.com/python-boto3-aws-s3/


#### Sagemaker

https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html

https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-mlconcepts.html

https://aws.amazon.com/getting-started/hands-on/build-train-deploy-machine-learning-model-sagemaker/

https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html

https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html


#### Lambda 

https://aws.amazon.com/getting-started/hands-on/run-serverless-code/




#### Case Study

https://aws.amazon.com/blogs/machine-learning/curalate-makes-social-sell-with-ai-using-apache-mxnet-on-aws/
https://www.wired.com/brandlab/2018/09/celgene-advances-pharma-research-discovery-help-ai/

#### use case download
https://pages.awscloud.com/GLOBAL-ln-GC-400-OTH-INFR-Accelerate-ML-with-Cloud-Services-and-Infrastructure-eBook-2021-learn.html?sc_icampaign=Other_glbl_site_merch-gc-400-oth-infr&sc_ichannel=ha&sc_icontent=awssm-9653_aquire&sc_ioutcome=Global_Marketing_Campaigns&sc_iplace=2up&trk=ha_a134p000007CwuEAAS~ha_awssm-9653_aquire&trkCampaign=GLBL-FY21-Q3-GC-400-OTH-INFR-Accelerate_ML





## 6. Detect NLP data drift using custom Amazon SageMaker Model Monitor
https://aws.amazon.com/blogs/machine-learning/detect-nlp-data-drift-using-custom-amazon-sagemaker-model-monitor/?nc1=b_rp


#### How Kustomer utilizes custom Docker images & Amazon SageMaker to build a text classification pipeline
https://aws.amazon.com/blogs/machine-learning/how-kustomer-utilizes-custom-docker-images-amazon-sagemaker-to-build-a-text-classification-pipeline/?nc1=b_rp

#### Sagemaker with Kubernetes

https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html

#### Amazon SageMaker Operators for Kubernetes

https://sagemaker.readthedocs.io/en/v1.57.0/amazon_sagemaker_operators_for_kubernetes.html#id82

#### Bring your own SageMaker image
https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi.html

#### Amazon SageMaker Operators for Kubernetes(Imp)
https://sagemaker.readthedocs.io/en/v1.57.0/amazon_sagemaker_operators_for_kubernetes.html

#### OVERVIEW OF CONTAINERS FOR AMAZON SAGEMAKER
https://sagemaker-workshop.com/custom/containers.html

#### Create a SageMaker image from the ECR container image
https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi-sdk-create-image.html

### Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
https://www.youtube.com/watch?v=pfBGgSGnYLs

#### Hugging Face in AWS 
https://huggingface.co/docs/sagemaker/main

### Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker
https://www.youtube.com/watch?v=pfBGgSGnYLs


#### iFood Implements AI Area to Enhance Client and Restaurants Experience Using AWS
#### Food & Beverage Data Set
https://aws.amazon.com/marketplace/pp/prodview-swd4cmvolle2s#overview

https://aws.amazon.com/solutions/case-studies/ifood/

https://aws.amazon.com/travel-and-hospitality/restaurants/

https://aws.amazon.com/blogs/industries/restaurants-are-hungry-for-data-insights-as-food-delivery-analytics-gains-traction/


https://www.repustate.com/banking-sentiment-analysis-and-text-analytics/#:~:text=Sentiment%20analysis%20consists%20of%20the,is%20positive%2C%20negative%20or%20neutral.

https://aws.amazon.com/blogs/machine-learning/how-kustomer-utilizes-custom-docker-images-amazon-sagemaker-to-build-a-text-classification-pipeline/?nc1=b_rp