<a href="https://colab.research.google.com/github/TyMill/experiments_ideas/blob/main/rl_eg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This code uses the SARSA algorithm to train an agent to determine the optimal price for a product based on the prices of its competitors. The environment is defined by the `Environment` class, which takes in a list of prices representing the competition's prices. The agent, defined by the `SARSAgent` class, interacts with the environment by choosing an action (i.e. setting a price) and receiving a reward based on the profit made from that action. The agent learns from this experience by updating its Q-values for the current state-action pair using the SARSA update rule. After training for a specified number of episodes, the code prints out the optimal price determined by the agent.

Note: the optimal price may be lower than the competition price, but it's a good practice to test your strategy with different prices, and different scenarios.

In [None]:
import numpy as np

class Environment:
    def __init__(self, prices):
        self.prices = prices
        self.num_prices = len(prices)

    def transition(self, state, action):
        price = self.prices[action]
        demand = np.random.normal(100, 15)
        revenue = demand * price
        cost = demand * 10
        profit = revenue - cost
        next_state = (state[1], price)
        return (next_state, profit)

    def reset(self):
        return (0, self.prices[0])

class SARSAgent:
    def __init__(self, actions, epsilon=0.1, alpha=0.5, gamma=0.9):
        self.q_values = dict()
        self.actions = actions
        self.epsilon = epsilon
        self.alpha = alpha
        self.gamma = gamma

    def learn(self, state, action, next_state, reward, next_action):
        q_val = self.get_q_value(state, action)
        next_q_val = self.get_q_value(next_state, next_action)
        new_q_val = q_val + self.alpha * (reward + self.gamma * next_q_val - q_val)
        self.set_q_value(state, action, new_q_val)

    def act(self, state):
        if np.random.rand() < self.epsilon:
            return np.random.choice(self.actions)
        else:
            q_values = [self.get_q_value(state, a) for a in self.actions]
            return self.actions[np.argmax(q_values)]

    def get_q_value(self, state, action):
        if (state, action) not in self.q_values:
            self.q_values[(state, action)] = 0
        return self.q_values[(state, action)]

    def set_q_value(self, state, action, value):
        self.q_values[(state, action)] = value

if __name__ == "__main__":
    prices = [50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
    env = Environment(prices)
    agent = SARSAgent(prices)
    num_episodes = 1000

    for episode in range(num_episodes):
        state = env.reset()
        action = agent.act(state)

        for t in range(100):
            next_state, reward = env.transition(state, action)
            next_action = agent.act(next_state)
            agent.learn(state, action, next_state, reward, next_action)
            state = next_state
            action = next_action

    q_values = [agent.get_q_value((0, p), p) for p in prices]
    optimal_price = prices[np.argmax(q_values)]
    print("Optimal price:", optimal_price)


df z xls'ow

In [None]:
import os
import pandas as pd

# Set the path to the folder containing the Excel files
path = 'path/to/folder'

# Create an empty list to store the DataFrames
data_frames = []

# Iterate through the files in the folder
for file_name in os.listdir(path):
    # Check if the file is an Excel file
    if file_name.endswith('.xlsx'):
        # Read the Excel file into a DataFrame
        df = pd.read_excel(os.path.join(path, file_name))
        # Append the DataFrame to the list
        data_frames.append(df)

# Concatenate all of the DataFrames together
final_df = pd.concat(data_frames)


dict z csv

In [None]:
import pandas as pd

# Read the CSV file into a DataFrame
df = pd.read_csv('path/to/file.csv')

# Convert the DataFrame to a dictionary
data_dict = df.to_dict()

data_dict = df.to_dict(orient='records')


slownik

In [None]:
my_dict = {}
for i in range(1,5001):
    my_dict[f'EC{i}'] = f'Fandom{i}'


julia xg boost

In [None]:
using XGBoost
using RDatasets

# Load the iris dataset
iris = dataset("datasets", "iris")

# Split the dataset into training and testing sets
train_indices = sample(1:nrow(iris), Int(0.8 * nrow(iris)))
train = iris[train_indices, :]
test = iris[setdiff(1:nrow(iris), train_indices), :]

# Define the training data and labels
train_data = train[:, 1:4]
train_labels = train[:, :Species]

# Define the test data and labels
test_data = test[:, 1:4]
test_labels = test[:, :Species]

# Train the model
model = xgboost(train_data, train_labels, objective = "multi:softprob", num_class = 3)

# Make predictions on the test data
predictions = predict(model, test_data)

# Evaluate the model
accuracy = mean(argmax(predictions, 2) .== argmax(test_labels, 2))
println("Accuracy: $accuracy")





using XGBoost
using DataFrames
using MLJ
using Random
using Flux
using Flux.Data.MNIST
using MLJModels
using MLJFlux
using MLJTuning
using MLJTuning.GridSearch

# Define the hyperparameter grid
parameter_grid = Dict(
    :eta => [0.1, 0.01, 0.001],
    :max_depth => [2, 3, 4, 5],
    :subsample => [0.7, 0.8, 0.9],
    :colsample_bytree => [0.7, 0.8, 0.9],
    :alpha => [0, 1, 2],
    :lambda => [0, 1, 2],
    :num_round => [10, 20, 30]
)

# Define the XGBoost model
model = XGBoostClassifier()

# Create the GridSearchCV object
grid_search = GridSearchCV(model, parameter_grid)

# Fit the GridSearchCV object to the training data
MLJ.fit!(grid_search, train_data, train_labels)

# Print the best hyperparameters
best_params = grid_search.best_params_
println("Best parameters: ", best_params)

# Make predictions on the test data
predictions = MLJ.predict(grid_search, test_data)

# Evaluate the model
accuracy = mean(predictions .== test_labels)
println("Accuracy: $accuracy")


This example uses the Q-learning algorithm to train an agent to navigate a grid in the "FrozenLake-v0" environment from OpenAI Gym. The agent starts in a random state and takes actions based on the Q-values of the state-action pairs. The Q-values are updated using the Q-learning update rule, which is based on the observed reward and the maximum Q-value of the next state. The agent's performance is evaluated by measuring the total reward it receives during a test episode, after training.

This is a basic example of RL in python, but it is used to demonstrate the concept of RL. There are many more complex algorithms exist that can be used to solve different types of problems.

In [None]:
import random
import numpy as np
from collections import defaultdict

class MyEnv:
    def __init__(self, prices, demand_params):
        self.prices = prices
        self.demand_params = demand_params
        self.reset()

    def transition(self, state, action):
        price = self.prices[action]
        demand = np.random.normal(self.demand_params[0], self.demand_params[1])
        revenue = price * demand
        next_state = revenue
        return next_state, revenue
        
    def reset(self):
        self.state = None
        return self.state
    
class QLearningAgent:
    def __init__(self, actions, epsilon=0.1, alpha=0.5, gamma=0.9):
        self.q = defaultdict(lambda: [0.0, 0.0])
        self.epsilon = epsilon
        self.alpha = alpha
        self.gamma = gamma
        self.actions = actions

    def learn(self, state, action, reward, next_state):
        max_q_next = max(self.q[next_state])
        q_val = self.q[state][action]
        q_val += self.alpha * (reward + self.gamma * max_q_next - q_val)
        self.q[state][action] = q_val

    def act(self, state, epsilon=None):
        if epsilon is None:
            epsilon = self.epsilon
        if random.random() < epsilon:
            return random.choice(self.actions)
        else:
            q_values = self.q[state]
            return self.actions[np.argmax(q_values)]

if __name__ == '__main__':
    prices = [10, 20, 30, 40, 50]
    demand_params = [100, 20]
    env = MyEnv(prices, demand_params)

    agent = QLearningAgent(range(len(prices)))

    num_episodes = 1000
    for episode in range(num_episodes):
        state = env.reset()
        action = agent.act(state)
        total_reward = 0
        while True:
            next_state, reward = env.transition(state, action)
            agent.learn(state, action, reward, next_state)
            total_reward += reward

            action = agent.act(next_state)
            state = next_state

            if env.termination(state):
                break
    print(agent.q)
    optimal_price = prices[np.argmax(agent.q[None])]
    print(f'Optimal price: {optimal_price}')


In [None]:
import gym
import numpy as np

# Create the environment
env = gym.make('FrozenLake-v0')

# Define the Q-table and its initial values
q_table = np.zeros([env.observation_space.n, env.action_space.n])

# Define the hyperparameters
num_episodes = 10000
learning_rate = 0.8
max_steps = 99
gamma = 0.95

# Train the agent
for episode in range(num_episodes):
    state = env.reset()
    done = False
    rewards = 0
    for step in range(max_steps):
        # Choose an action based on the current state
        action = np.argmax(q_table[state, :] + np.random.randn(1, env.action_space.n) * (1. / (episode + 1)))

        # Take the action and observe the new state, reward, and whether the episode is done
        new_state, reward, done, _ = env.step(action)

        # Update the Q-value for the current state-action pair
        q_table[state, action] = q_table[state, action] + learning_rate * (reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action])
        rewards += reward
        state = new_state
        if done:
            break

    if episode % 1000 == 0:
        print("Average reward:", rewards / 1000)

# Test the agent
state = env.reset()
done = False
rewards = 0
for step in range(max_steps):
    # Choose the action with the highest Q-value
    action = np.argmax(q_table[state, :])
    state, reward, done, _ = env.step(action)
    rewards += reward
    if done:
        break
print("Total reward:", rewards)
