**Purpose**

This jupyter notebook acts as the data source used for stream processing (the data producer)

This jupter notebok file generates stock market prices json data with the following keys:

company- The company code

stock_price- The stock closing price of the company at the time the script runs

After data is randomly generated, it gets pushed to a kinesis stream (topic)


In [63]:
import boto3

In [64]:
kinesis_client = boto3.client('kinesis')

**What does *boto3.client('kinesis')* do?**

Here's a breakdown:


boto3 is the library (the top-level module). 

boto3.client() is a function provided by that module.

When you call boto3.client('kinesis'), it returns a service client object for the AWS service you specify (Kinesis)

This object is not an instance of a class called Kinesis.
Instead, it's a generic low-level client object that knows how to talk to the Kinesis API.

So what is the type of that client object?

if you try print(type(client)), You’ll get something like: <class 'botocore.client.Kinesis'>
That Kinesis class is not a hand-written class — it’s generated at runtime by botocore,
which builds classes on the fly based on service definitions.

The Kinesis object has many functions such as create_stream(), putrecord(), etc...

In [44]:
#this function generates simulated stock prices for a set of companies using a random range (+/-10%) of the current price
#it returns a list of dictionaries, each item in the list is a dictionary that contains the symbol name and the price
def create_stock_market_data():
    
    import pandas as pd
    from datetime import datetime, timedelta
    from random import randrange
    
    #get the current stock prices of some companies on 25/01/2025
    companies_price = {
                        'NVDA': {'current_price':143},
                        'AAPL': {'current_price':223},
                        'MSFT': {'current_price':444},
                        'AMZN': {'current_price':234},
                        'GOOGL': {'current_price':200},
                        'META': {'current_price':647},
                        'TSLA': {'current_price':407},
                        'WMT': {'current_price':95},
                        'JPM': {'current_price':265},
                        'V': {'current_price':330},
                        'ORCL': {'current_price':184},
                        'MA': {'current_price':490},
                        'XOM': {'current_price':109},
                        'NFLX': {'current_price':978},
                        'PG': {'current_price':164},
                        'SAP': {'current_price':276}
                        }
    
    
    stock_data = []
    
    for k in companies_price:
        
        #create a min and max price for each current price by adding and removing 10% from the current price
        companies_price[k]['min_price'] = int(companies_price[k]['current_price'] * 0.9)
        companies_price[k]['max_price'] = int(companies_price[k]['current_price'] * 1.1)
        
        #generate a new random current price between the min and the max price
        companies_price[k]['current_price'] = randrange(companies_price[k]['min_price'], companies_price[k]['max_price'])
        
        #delete the keys min_price, max_price since they're not needed anymore
        del companies_price[k]['min_price'] 
        del companies_price[k]['max_price']
        
        stock_data.append({'symbol': k, 'price': companies_price[k]['current_price']})
                                
    return stock_data

In [50]:
records = create_stock_market_data()

In [78]:
#This function receives a list of dictionaries as an input parameter, then tranfsorms and 
#encodes it (change from string to bytes), then pushes it into a stream

def write_records_to_stream(records):
    
    import json
    
    #generate stock market data list
    records = create_stock_market_data()
    
    encoded_records = [] #this empty list will contain the encoded list
    
    #for each record, convert the data from a dictionary to json to be able to encode it.
    #encoding is converting data from one format into another usually into bytes, which is the raw format
    #computers and services like Kinesis expect. It is stated in the documentation of the function put_records that
    #data should be in bytes
    #Creating the partition key is necessary since all data records with the same partition key
    #map to the same shard(partition) within the stream(topic).
    for record in records:
        encoded_records.append(
                                {'Data': json.dumps(record).encode('utf-8'),
                                'PartitionKey': record['symbol']
                                    }
                                    )
    
     # Send the records batch to Kinesis
    response = kinesis_client.put_records(Records = encoded_records, StreamName = 'my_first_data_stream')
    
    return response

In [79]:
write_records_to_stream(records)

{'FailedRecordCount': 0,
 'Records': [{'SequenceNumber': '49662652658056415759625962697078187108115376095384043538',
   'ShardId': 'shardId-000000000001'},
  {'SequenceNumber': '49662652658078716504824493320213678197289951311016493090',
   'ShardId': 'shardId-000000000002'},
  {'SequenceNumber': '49662652658078716504824493320214887123109565940191199266',
   'ShardId': 'shardId-000000000002'},
  {'SequenceNumber': '49662652658034115014427432073941487093121186250576887810',
   'ShardId': 'shardId-000000000000'},
  {'SequenceNumber': '49662652658101017250023023943350378212284141155823648818',
   'ShardId': 'shardId-000000000003'},
  {'SequenceNumber': '49662652658101017250023023943352796063923370414173061170',
   'ShardId': 'shardId-000000000003'},
  {'SequenceNumber': '49662652658101017250023023943354004989742985043347767346',
   'ShardId': 'shardId-000000000003'},
  {'SequenceNumber': '49662652658078716504824493320216096048929180569365905442',
   'ShardId': 'shardId-000000000002'},
  {'

In [81]:
# import boto3

# session = boto3.session.Session()
# print("AWS profile used:", session.profile_name)
# print("Region:", session.region_name)

# creds = session.get_credentials()
# print("Access key:", creds.access_key)
# print("Secret key:", creds.secret_key[:4] + "..." if creds else "None")
