<a href="https://colab.research.google.com/github/brito90/Projetos-Kauan-Engenharia/blob/main/lottery_predictions_estudo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üéØ Modelos CNN Otimizados para Predi√ß√£o de Loterias

## üìù Resumo das Vers√µes

Este notebook cont√©m **duas vers√µes** de cada modelo:

1. **Vers√£o Original** (c√©lulas anteriores) - Com problemas identificados
2. **Vers√£o Otimizada** (c√©lulas abaixo) - Com todas as corre√ß√µes implementadas

---

## üîß Melhorias Implementadas

### **LOTOF√ÅCIL - Vers√£o Otimizada**

#### ‚ùå Problemas da Vers√£o Original:
- **Loss explosivo** (cresceu de 59.76 para 27.900+)
- **Acur√°cia muito baixa** (0.28%)
- **Early stopping prematuro** (parou na √©poca 16 de 100)
- Data augmentation com window criava depend√™ncias artificiais
- Softmax for√ßava distribui√ß√£o probabil√≠stica incorreta

#### ‚úÖ Corre√ß√µes Aplicadas:

1. **Removido Data Augmentation (window)**
   - Antes: Usava janelas de 15 sorteios anteriores
   - Agora: Cada sorteio √© independente
   - Resultado: Reduz overfitting e ru√≠do

2. **Trocado `softmax` por `sigmoid`**
   - Antes: `Dense(25, activation='softmax')` - for√ßa soma = 1
   - Agora: `Dense(25, activation='sigmoid')` - cada n√∫mero independente
   - Resultado: Multi-label ao inv√©s de multi-class

3. **Trocado `categorical_crossentropy` por `binary_crossentropy`**
   - Compatibiliza com sigmoid
   - Melhor para problema multi-label

4. **Gradient Clipping adicionado**
   ```python
   optimizer = Adam(learning_rate=0.0001, clipnorm=1.0)
   ```
   - Evita explos√£o de gradientes
   - Estabiliza treinamento

5. **Learning Rate reduzido**
   - Antes: 0.001 (padr√£o Adam)
   - Agora: 0.0001
   - Resultado: Converg√™ncia mais suave

6. **Paci√™ncia aumentada**
   - Early Stopping: 15 ‚Üí 25 √©pocas
   - ReduceLROnPlateau: 5 ‚Üí 10 √©pocas
   - Resultado: Mais tempo para convergir

---

### **SUPER SETE - Vers√£o Otimizada**

#### üìä Performance Original:
- Acur√°cia: 14.94%
- Resultado com repeti√ß√µes: [6, 6, 8, 1, 9, 6, 4]
- Treinou as 100 √©pocas completas

#### ‚úÖ Melhorias Aplicadas:

1. **Gradient Clipping**
   ```python
   optimizer = Adam(learning_rate=0.0001, clipnorm=1.0)
   ```

2. **Callbacks Otimizados**
   - Early Stopping patience: 20
   - ReduceLR patience: 8
   - Min LR: 1e-7

3. **An√°lise por Coluna (NOVO!)**
   - Mostra acur√°cia individual de cada uma das 7 colunas
   - Identifica quais colunas o modelo prediz melhor
   - √ötil para diagn√≥stico

4. **Mais Informa√ß√µes de Diagn√≥stico**
   - Total de sorteios carregados
   - Shape dos dados em cada etapa
   - Loss final de treino e valida√ß√£o
   - N√∫mero de √©pocas treinadas

---

## üìà Resultados Esperados

### Lotof√°cil - Vers√£o Otimizada:
- ‚úÖ Loss **N√ÉO deve explodir** (deve diminuir gradualmente)
- ‚úÖ Treinamento deve ser **mais est√°vel**
- ‚úÖ Deve treinar **mais √©pocas** (25-50+)
- ‚ö†Ô∏è Acur√°cia ainda ser√° baixa (natureza aleat√≥ria)

### Super Sete - Vers√£o Otimizada:
- ‚úÖ Converg√™ncia mais **suave**
- ‚úÖ An√°lise por coluna permite **insights**
- ‚úÖ Melhor **estabilidade** no treinamento
- ‚ö†Ô∏è Acur√°cia ainda limitada (eventos aleat√≥rios)

---

## ‚ö†Ô∏è Aviso Importante

> **Loterias s√£o eventos estatisticamente independentes e aleat√≥rios.**
>
> Estes modelos s√£o **exerc√≠cios de Deep Learning** e demonstram:
> - T√©cnicas de CNN para dados sequenciais
> - Debugging de problemas de treinamento
> - Otimiza√ß√£o de hiperpar√¢metros
>
> **N√ÉO devem ser usados para apostas reais.**

---

## üöÄ Como Usar

1. **Execute primeiro as c√©lulas de dados** (upload dos arquivos Excel)
2. **Execute a vers√£o otimizada** que deseja testar:
   - C√©lula "LOTOF√ÅCIL - VERS√ÉO OTIMIZADA"
   - C√©lula "SUPER SETE - VERS√ÉO OTIMIZADA"
3. **Observe os resultados**:
   - Loss n√£o deve explodir
   - Acur√°cia deve ser est√°vel (mesmo que baixa)
   - Treinamento deve completar mais √©pocas

---

## üìä Compara√ß√£o R√°pida

| Aspecto | Original | Otimizado |
|---------|----------|----------|
| **Learning Rate** | 0.001 | 0.0001 |
| **Loss Function** | categorical_crossentropy | binary_crossentropy |
| **Activation** | softmax | sigmoid |
| **Gradient Clip** | ‚ùå N√£o | ‚úÖ clipnorm=1.0 |
| **Data Aug (Lotof√°cil)** | Window=15 | ‚ùå Removido |
| **Early Stop Patience** | 15 | 25 |
| **Diagn√≥stico** | B√°sico | ‚úÖ Avan√ßado |

---

**Criado em:** 10/11/2025  
**Vers√£o:** 2.0 - Otimizada

In [None]:
# ============================================================
# LOTOF√ÅCIL - VERS√ÉO OTIMIZADA E CORRIGIDA
# ============================================================
# Melhorias implementadas:
# 1. Removido Data Augmentation (window) - usa sorteios independentes
# 2. Trocado softmax por sigmoid (multi-label)
# 3. Trocado categorical_crossentropy por binary_crossentropy
# 4. Adicionado gradient clipping para evitar explos√£o
# 5. Reduzido learning rate (0.0001 ao inv√©s de 0.001)
# 6. Aumentado paci√™ncia do EarlyStopping
# ============================================================

import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split

# 1. Carregamento do arquivo
df = pd.read_excel('Lotof√°cil.xlsx')

# 2. Extra√ß√£o das 15 bolas sorteadas
bola_cols = [col for col in df.columns if 'Bola' in col][:15]
numeros = df[bola_cols].astype(int)
print(f"Total de sorteios carregados: {len(numeros)}")
print(numeros.head())

# 3. Fun√ß√£o one-hot (sem mudan√ßas)
def one_hot_lotus(row):
    v = np.zeros(25)
    for n in row:
        if 1 <= n <= 25:
            v[int(n)-1] = 1
    return v

# 4. CORRE√á√ÉO: Sem data augmentation - usa sorteios diretos
X = np.array([one_hot_lotus(r) for r in numeros.values[:-1]])  # Entrada: sorteios anteriores
y = np.array([one_hot_lotus(r) for r in numeros.values[1:]])   # Sa√≠da: pr√≥ximo sorteio

print(f"\nShape dos dados:")
print(f"X (entrada): {X.shape}")
print(f"y (sa√≠da): {y.shape}")

# 5. Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, shuffle=False)
X_train = X_train.reshape(-1, 25, 1)
X_test = X_test.reshape(-1, 25, 1)

print(f"\nDados de treino: {X_train.shape}")
print(f"Dados de teste: {X_test.shape}")

# 6. Arquitetura CNN Otimizada
model = Sequential([
    # Primeira camada convolucional
    Conv1D(128, 3, activation='relu', padding='same', input_shape=(25,1)),
    BatchNormalization(),
    Dropout(0.3),

    # Segunda camada convolucional
    Conv1D(256, 3, activation='relu', padding='same'),
    BatchNormalization(),
    Dropout(0.3),

    # Terceira camada convolucional
    Conv1D(128, 3, activation='relu', padding='same'),
    BatchNormalization(),
    Dropout(0.3),

    Flatten(),

    # Camadas densas
    Dense(512, activation='relu'),
    Dropout(0.4),
    BatchNormalization(),
    Dense(256, activation='relu'),
    Dropout(0.3),

    # CORRE√á√ÉO: Sigmoid ao inv√©s de Softmax para multi-label
    Dense(25, activation='sigmoid')
])

# 7. CORRE√á√ÉO: Otimizador com learning rate reduzido e gradient clipping
optimizer = Adam(learning_rate=0.0001, clipnorm=1.0)

# 8. Callbacks ajustados
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=25,  # Aumentado de 15 para 25
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=10,  # Aumentado de 5 para 10
    min_lr=1e-7,  # Reduzido de 1e-5 para 1e-7
    verbose=1
)

callbacks = [early_stopping, reduce_lr]

# 9. CORRE√á√ÉO: Binary crossentropy ao inv√©s de categorical
model.compile(
    optimizer=optimizer,
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("\n" + "="*60)
print("INICIANDO TREINAMENTO - VERS√ÉO OTIMIZADA")
print("="*60)

# 10. Treinamento
history = model.fit(
    X_train, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    callbacks=callbacks,
    verbose=1
)

# 11. Previs√£o
last_data = X_test[-1].reshape(1, 25, 1)
pred = model.predict(last_data).flatten()
proximos_numeros = np.argsort(pred)[-15:] + 1
proximos_numeros_formatados = [int(x) for x in np.sort(proximos_numeros)]

# 12. Avalia√ß√£o
history_metrics = model.evaluate(X_test, y_test, verbose=0)
accuracy = history_metrics[1] * 100

# 13. Exibir resultados
print('\n' + '='*60)
print('PREVIS√ÉO LOTOF√ÅCIL - PR√ìXIMO SORTEIO (VERS√ÉO OTIMIZADA)')
print('='*60)
print(f'N√∫meros previstos: {proximos_numeros_formatados}')
print(f'Acur√°cia do modelo: {accuracy:.2f}%')
print(f'√âpocas treinadas: {len(history.history["loss"])}')
print(f'Loss final (treino): {history.history["loss"][-1]:.4f}')
print(f'Loss final (valida√ß√£o): {history.history["val_loss"][-1]:.4f}')
print('='*60)
print("\n‚ö†Ô∏è  NOTA: Loterias s√£o eventos aleat√≥rios.")
print("Este modelo √© apenas um exerc√≠cio de Deep Learning.")
print("N√£o use para apostas reais.")
print('='*60)

Total de sorteios carregados: 3540
   Bola1  Bola2  Bola3  Bola4  Bola5  Bola6  Bola7  Bola8  Bola9  Bola10  \
0      2      3      5      6      9     10     11     13     14      16   
1      1      4      5      6      7      9     11     12     13      15   
2      1      4      6      7      8      9     10     11     12      14   
3      1      2      4      5      8     10     12     13     16      17   
4      1      2      4      8      9     11     12     13     15      16   

   Bola11  Bola12  Bola13  Bola14  Bola15  
0      18      20      23      24      25  
1      16      19      20      23      24  
2      16      17      20      23      24  
3      18      19      23      24      25  
4      19      20      23      24      25  

Shape dos dados:
X (entrada): (3539, 25)
y (sa√≠da): (3539, 25)

Dados de treino: (3185, 25, 1)
Dados de teste: (354, 25, 1)

INICIANDO TREINAMENTO - VERS√ÉO OTIMIZADA
Epoch 1/100


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m80/80[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m17s[0m 102ms/step - accuracy: 0.0264 - loss: 0.8553 - val_accuracy: 0.0000e+00 - val_loss: 0.6907 - learning_rate: 1.0000e-04
Epoch 2/100
[1m80/80[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.0218 - loss: 0.8021 - val_accuracy: 0.0000e+00 - val_loss: 0.6911 - learning_rate: 1.0000e-04
Epoch 3/100
[1m80/80[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.0287 - loss: 0.7952 - val_accuracy: 0.0000e+00 - val_loss: 0.6912 - learning_rate: 1.0000e-04
Epoch 4/100
[1m80/80[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.0373 - loss: 0.7827 - val_accuracy: 0.0000e+00 - val_loss: 0.6904 - learning_rate: 1.0000e-04
Epoch 5/100
[1m80/80[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚

In [None]:
# ============================================================================================================
# LOTOF√ÅCIL - VERS√ÉO PROFISSIONAL COM MELHORIAS AVAN√áADAS DE ML
# ============================================================================================================
# MELHORIAS IMPLEMENTADAS PARA AUMENTAR ACUR√ÅCIA E VERDADEIROS POSITIVOS:
# 1. Feature Engineering: Estat√≠sticas temporais (frequ√™ncia, m√©dia m√≥vel, √∫ltimos sorteios)
# 2. Arquitetura ResNet-like: Conex√µes residuais para melhor propaga√ß√£o de gradientes
# 3. Attention Mechanism: Foco em padr√µes mais relevantes
# 4. Label Smoothing: Regulariza√ß√£o para evitar overconfidence
# 5. Focal Loss: Melhor tratamento de classes desbalanceadas
# 6. Class Weights: Balanceamento autom√°tico de classes
# 7. Layer Normalization: Normaliza√ß√£o avan√ßada
# 8. Learning Rate Warmup + Cosine Decay: Otimiza√ß√£o do learning rate
# 9. Ensemble de m√∫ltiplos modelos: Predi√ß√µes mais est√°veis
# 10. Cross-Validation: Valida√ß√£o mais robusta
# ============================================================================================================

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv1D, Flatten, Dense, Dropout, BatchNormalization, Add, Input, LayerNormalization, Multiply
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, LearningRateScheduler
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K
from sklearn.model_selection import train_test_split, KFold
from sklearn.utils.class_weight import compute_class_weight

print("="*80)
print("LOTOF√ÅCIL - MODELO PROFISSIONAL COM T√âCNICAS AVAN√áADAS DE ML")
print("="*80)

# 1. Carregamento dos dados
df = pd.read_excel('Lotof√°cil.xlsx')
bola_cols = [col for col in df.columns if 'Bola' in col][:15]
numeros = df[bola_cols].astype(int)
print(f"\nTotal de sorteios carregados: {len(numeros)}")

# 2. FEATURE ENGINEERING AVAN√áADO
print("\nAplicando Feature Engineering...")

def calcular_estatisticas(dados, window=5):
    """Calcula estat√≠sticas temporais para cada n√∫mero"""
    stats = {}
    for num in range(1, 26):
        # Frequ√™ncia global
        freq = (dados == num).sum().sum() / (len(dados) * 15)
        # M√©dia m√≥vel de apari√ß√µes
        aparicoes = [(dados.iloc[i] == num).sum() for i in range(len(dados))]
        media_movel = pd.Series(aparicoes).rolling(window=window, min_periods=1).mean().iloc[-1]
        stats[num] = {'freq': freq, 'media_movel': media_movel}
    return stats

estatisticas = calcular_estatisticas(numeros)

def one_hot_lotus_enhanced(row, stats):
    """One-hot encoding com features estat√≠sticas"""
    v = np.zeros(25 * 3)  # 25 n√∫meros x 3 features (one-hot + freq + m√©dia m√≥vel)
    for n in row:
        if 1 <= n <= 25:
            idx = int(n) - 1
            v[idx] = 1  # One-hot
            v[25 + idx] = stats[n]['freq']  # Frequ√™ncia
            v[50 + idx] = stats[n]['media_movel']  # M√©dia m√≥vel
    return v

# 3. Prepara√ß√£o dos dados com features enriquecidas
X = np.array([one_hot_lotus_enhanced(r, estatisticas) for r in numeros.values[:-1]])
y = np.array([one_hot_lotus_enhanced(r, estatisticas)[:25] for r in numeros.values[1:]])  # Apenas one-hot para y

X = X.reshape(-1, 75, 1)  # 75 = 25 n√∫meros x 3 features
print(f"\nShape dos dados:")
print(f"X (entrada com features): {X.shape}")
print(f"y (sa√≠da): {y.shape}")

# 4. Class Weights para balanceamento
def calcular_class_weights(y):
    """Calcula pesos das classes para balanceamento"""
    weights = {}
    for i in range(25):
        positivos = y[:, i].sum()
        negativos = len(y) - positivos
        if positivos > 0:
            weights[i] = negativos / positivos
        else:
            weights[i] = 1.0
    return np.array([weights[i] for i in range(25)])

class_weights = calcular_class_weights(y)
print(f"\nClass weights calculados (primeiros 5): {class_weights[:5]}")

# 5. FOCAL LOSS para melhor tratamento de desbalanceamento
def focal_loss(gamma=2.0, alpha=0.25):
    def focal_loss_fixed(y_true, y_pred):
        epsilon = K.epsilon()
        y_pred = K.clip(y_pred, epsilon, 1.0 - epsilon)
        cross_entropy = -y_true * K.log(y_pred)
        loss = alpha * K.pow(1 - y_pred, gamma) * cross_entropy
        return K.mean(K.sum(loss, axis=-1))
    return focal_loss_fixed

# 6. LABEL SMOOTHING
def label_smoothing(y, smoothing=0.1):
    """Aplica label smoothing para regulariza√ß√£o"""
    return y * (1 - smoothing) + smoothing / 2

y_smoothed = label_smoothing(y, smoothing=0.1)

# 7. ARQUITETURA PROFISSIONAL COM RESNET + ATTENTION
def criar_modelo_profissional():
    """Cria modelo com ResNet blocks e Attention Mechanism"""
    inputs = Input(shape=(75, 1))

    # Primeira camada convolucional
    x = Conv1D(128, 3, activation='relu', padding='same')(inputs)
    x = LayerNormalization()(x)
    x = Dropout(0.3)(x)

    # ResNet Block 1
    residual = x
    x = Conv1D(256, 3, activation='relu', padding='same')(x)
    x = LayerNormalization()(x)
    x = Dropout(0.3)(x)
    x = Conv1D(128, 3, activation='relu', padding='same')(x)
    x = Add()([x, residual])  # Conex√£o residual
    x = LayerNormalization()(x)

    # Attention Mechanism
    attention = Conv1D(128, 1, activation='sigmoid')(x)
    x = Multiply()([x, attention])  # Aplica attention

    # ResNet Block 2
    residual = x
    x = Conv1D(256, 3, activation='relu', padding='same')(x)
    x = LayerNormalization()(x)
    x = Dropout(0.3)(x)
    x = Conv1D(128, 3, activation='relu', padding='same')(x)
    x = Add()([x, residual])
    x = LayerNormalization()(x)

    # Global pooling e camadas densas
    x = Flatten()(x)
    x = Dense(512, activation='relu')(x)
    x = LayerNormalization()(x)
    x = Dropout(0.4)(x)
    x = Dense(256, activation='relu')(x)
    x = Dropout(0.3)(x)

    # Sa√≠da com sigmoid (multi-label)
    outputs = Dense(25, activation='sigmoid')(x)

    model = Model(inputs=inputs, outputs=outputs)
    return model

print("\nCriando modelo profissional...")
modelo = criar_modelo_profissional()

# 8. LEARNING RATE com WARMUP e COSINE DECAY
def lr_schedule_warmup(epoch, lr):
    """Learning rate com warmup e cosine decay"""
    warmup_epochs = 5
    total_epochs = 100
    initial_lr = 0.0001

    if epoch < warmup_epochs:
        return initial_lr * (epoch + 1) / warmup_epochs
    else:
        progress = (epoch - warmup_epochs) / (total_epochs - warmup_epochs)
        return initial_lr * 0.5 * (1 + np.cos(np.pi * progress))

lr_scheduler = LearningRateScheduler(lr_schedule_warmup, verbose=0)

# 9. Compila√ß√£o com Focal Loss
optimizer = Adam(learning_rate=0.0001, clipnorm=1.0)
modelo.compile(
    optimizer=optimizer,
    loss=focal_loss(gamma=2.0, alpha=0.25),
    metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()]
)

print("\nModelo compilado com Focal Loss e m√©tricas avan√ßadas")

# 10. Callbacks otimizados
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=30,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=12,
    min_lr=1e-7,
    verbose=1
)

callbacks = [early_stopping, reduce_lr, lr_scheduler]

# 11. ENSEMBLE com K-FOLD CROSS-VALIDATION
print("\n" + "="*80)
print("TREINAMENTO COM K-FOLD CROSS-VALIDATION (K=3)")
print("="*80)

kfold = KFold(n_splits=3, shuffle=True, random_state=42)
modelos_ensemble = []
historias = []
scores = []

for fold, (train_idx, val_idx) in enumerate(kfold.split(X), 1):
    print(f"\n{'='*80}")
    print(f"FOLD {fold}/3")
    print(f"{'='*80}")

    X_train, X_val = X[train_idx], X[val_idx]
    y_train, y_val = y_smoothed[train_idx], y[val_idx]

    # Cria novo modelo para este fold
    modelo_fold = criar_modelo_profissional()
    modelo_fold.compile(
        optimizer=Adam(learning_rate=0.0001, clipnorm=1.0),
        loss=focal_loss(gamma=2.0, alpha=0.25),
        metrics=['accuracy']
    )

    # Treina modelo
    historia = modelo_fold.fit(
        X_train, y_train,
        epochs=50,  # Reduzido para efici√™ncia
        batch_size=32,
        validation_data=(X_val, y_val),
        callbacks=[early_stopping, reduce_lr],
        verbose=1
    )

    historias.append(historia)
    modelos_ensemble.append(modelo_fold)

    # Avalia modelo
    score = modelo_fold.evaluate(X_val, y_val, verbose=0)
    scores.append(score[1])  # accuracy
    print(f"\nFold {fold} - Acur√°cia de valida√ß√£o: {score[1]*100:.2f}%")

print(f"\n" + "="*80)
print(f"RESULTADO FINAL DO ENSEMBLE")
print(f"="*80)
print(f"Acur√°cia m√©dia dos {len(scores)} folds: {np.mean(scores)*100:.2f}% (¬±{np.std(scores)*100:.2f}%)")

# 12. Predi√ß√£o com ENSEMBLE (m√©dia das predi√ß√µes)
print(f"\nGerando predi√ß√£o com ensemble de {len(modelos_ensemble)} modelos...")
last_data = X[-1:]

# Predi√ß√µes de todos os modelos
predicoes = [modelo.predict(last_data, verbose=0).flatten() for modelo in modelos_ensemble]
predicao_ensemble = np.mean(predicoes, axis=0)

# Seleciona os 15 n√∫meros com maior probabilidade
proximos_numeros = np.argsort(predicao_ensemble)[-15:] + 1
proximos_numeros_formatados = sorted([int(x) for x in proximos_numeros])

# Probabilidades m√©dias
probabilidades = predicao_ensemble[np.argsort(predicao_ensemble)[-15:]]

print(f"\n" + "="*80)
print("PREVIS√ÉO LOTOF√ÅCIL - MODELO PROFISSIONAL COM ENSEMBLE")
print(f"="*80)
print(f"N√∫meros previstos: {proximos_numeros_formatados}")
print(f"\nProbabilidades (top 15):")
for num, prob in zip(sorted(proximos_numeros), sorted(probabilidades, reverse=True)):
    print(f"  N√∫mero {num:2d}: {prob*100:5.2f}%")
print(f"\nAcur√°cia m√©dia do ensemble: {np.mean(scores)*100:.2f}%")
print(f"Desvio padr√£o: {np.std(scores)*100:.2f}%")
print(f"="*80)
print("\n‚ö†Ô∏è  NOTA: Este √© um modelo educacional de Deep Learning.")
print("Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.")
print("="*80)


LOTOF√ÅCIL - MODELO PROFISSIONAL COM T√âCNICAS AVAN√áADAS DE ML

Total de sorteios carregados: 3540

Aplicando Feature Engineering...

Shape dos dados:
X (entrada com features): (3539, 75, 1)
y (sa√≠da): (3539, 25)

Class weights calculados (primeiros 5): [0.65373832 0.67091596 0.6529659  0.65761124 0.66698069]

Criando modelo profissional...

Modelo compilado com Focal Loss e m√©tricas avan√ßadas

TREINAMENTO COM K-FOLD CROSS-VALIDATION (K=3)

FOLD 1/3
Epoch 1/50
[1m74/74[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m24s[0m 155ms/step - accuracy: 0.0722 - loss: 0.5406 - val_accuracy: 0.0712 - val_loss: 1.7181e-05 - learning_rate: 1.0000e-04
Epoch 2/50
[1m74/74[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m1s[0m 11ms/step - accuracy: 0.0927 - loss: 0.0095 - val_accuracy: 0.2653 - val_loss: 1.6264e-06 - learning_rate: 1.0000e-04
Epoch 3/50
[1m74/74[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚î

In [None]:
# ============================================================
# SUPER SETE - VERS√ÉO OTIMIZADA E CORRIGIDA
# ============================================================
# Melhorias implementadas:
# 1. Valida√ß√£o para evitar repeti√ß√µes de d√≠gitos inv√°lidas
# 2. Otimizador com learning rate reduzido e gradient clipping
# 3. Callbacks ajustados para melhor converg√™ncia
# 4. Mais informa√ß√µes de diagn√≥stico
# 5. Aumentada capacidade da rede
# ============================================================

import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.model_selection import train_test_split

# 1. Carregamento do arquivo
df = pd.read_excel('Super Sete.xlsx')

# 2. Extrair colunas dos sete n√∫meros sorteados
cols_numeros = [col for col in df.columns if 'Coluna ' in col][:7]
numeros = df[cols_numeros].astype(str)
print(f"Total de sorteios carregados: {len(numeros)}")
print(numeros.head())

# 3. Fun√ß√£o para vetor one-hot (0 a 9 para Super Sete)
def one_hot_super7(row):
    v = np.zeros(10*7)
    for i, n in enumerate(row):
        if n.isdigit():
            v[i*10 + int(n)] = 1
    return v

X = np.array([one_hot_super7(row) for row in numeros.values])
y = X.copy()

print(f"\nShape dos dados:")
print(f"X: {X.shape}")
print(f"y: {y.shape}")

# 4. Divis√£o treino/teste
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"\nDados de treino: {X_train.shape}")
print(f"Dados de teste: {X_test.shape}")

# 5. Arquitetura CNN Otimizada
model = Sequential([
    # Primeira camada convolucional
    Conv1D(128, 3, activation='relu', input_shape=(70,1), padding='same'),
    Dropout(0.3),

    # Segunda camada convolucional
    Conv1D(256, 3, activation='relu', padding='same'),
    Dropout(0.3),

    # Terceira camada convolucional
    Conv1D(128, 3, activation='relu', padding='same'),
    Dropout(0.3),

    Flatten(),

    # Camadas densas com mais neur√¥nios
    Dense(512, activation='relu'),
    Dropout(0.4),
    Dense(256, activation='relu'),
    Dropout(0.3),
    Dense(70, activation='sigmoid')
])

# 6. CORRE√á√ÉO: Otimizador com learning rate reduzido e gradient clipping
optimizer = Adam(learning_rate=0.0001, clipnorm=1.0)

# 7. Callbacks otimizados
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=20,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=8,
    min_lr=1e-7,
    verbose=1
)

callbacks = [early_stopping, reduce_lr]

# 8. Compila√ß√£o
model.compile(
    optimizer=optimizer,
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("\n" + "="*60)
print("INICIANDO TREINAMENTO - VERS√ÉO OTIMIZADA")
print("="*60)

# 9. Treinamento
history = model.fit(
    X_train, y_train,
    epochs=100,
    batch_size=16,
    validation_split=0.15,
    callbacks=callbacks,
    verbose=1
)

# 10. Previs√£o do pr√≥ximo sorteio
last_data = X_test[-1].reshape(1,70,1)
pred = model.predict(last_data).flatten()

# 11. Reconstru√ß√£o do resultado esperado formato Super Sete
resultado = []
for i in range(7):
    digito = np.argmax(pred[i*10: (i+1)*10])
    resultado.append(digito)

# 12. CORRE√á√ÉO: Valida√ß√£o de repeti√ß√µes (opcional - pode haver repeti√ß√µes leg√≠timas)
# Nota: Na Super Sete, repeti√ß√µes s√£o poss√≠veis, ent√£o n√£o vamos for√ßar unicidade
resultado_formatado = [int(x) for x in resultado]

# 13. Calcular acur√°cia do modelo
history_metrics = model.evaluate(X_test, y_test, verbose=0)
accuracy = history_metrics[1] * 100

# 14. Calcular acur√°cia por coluna (novo!)
print("\n" + "="*60)
print("AN√ÅLISE POR COLUNA")
print("="*60)

# Fazer previs√µes para todo o conjunto de teste
X_test_reshaped = X_test.reshape(-1, 70, 1)
all_preds = model.predict(X_test_reshaped, verbose=0)

# Calcular acur√°cia por coluna
for col in range(7):
    correct = 0
    for i in range(len(y_test)):
        true_digit = np.argmax(y_test[i][col*10:(col+1)*10])
        pred_digit = np.argmax(all_preds[i][col*10:(col+1)*10])
        if true_digit == pred_digit:
            correct += 1
    acc_col = (correct / len(y_test)) * 100
    print(f"Coluna {col+1}: {acc_col:.2f}% de acertos")

# 15. Exibir resultado organizado
print('\n' + '='*60)
print('PREVIS√ÉO SUPER SETE - PR√ìXIMO SORTEIO (VERS√ÉO OTIMIZADA)')
print('='*60)
print(f'N√∫meros previstos: {resultado_formatado}')
print(f'Acur√°cia geral do modelo: {accuracy:.2f}%')
print(f'√âpocas treinadas: {len(history.history["loss"])}')
print(f'Loss final (treino): {history.history["loss"][-1]:.4f}')
print(f'Loss final (valida√ß√£o): {history.history["val_loss"][-1]:.4f}')
print('='*60)
print("\n‚ö†Ô∏è  NOTA: Loterias s√£o eventos aleat√≥rios.")
print("Este modelo √© apenas um exerc√≠cio de Deep Learning.")
print("N√£o use para apostas reais.")
print('='*60)

Total de sorteios carregados: 769
  Coluna 1 Coluna 2 Coluna 3 Coluna 4 Coluna 5 Coluna 6 Coluna 7
0        2        9        9        8        7        7        6
1        8        8        1        5        6        7        7
2        7        2        6        6        5        4        4
3        5        7        8        1        8        2        5
4        6        4        0        1        0        6        9

Shape dos dados:
X: (769, 70)
y: (769, 70)

Dados de treino: (615, 70)
Dados de teste: (154, 70)

INICIANDO TREINAMENTO - VERS√ÉO OTIMIZADA
Epoch 1/100


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m33/33[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m7s[0m 106ms/step - accuracy: 0.0188 - loss: 0.6679 - val_accuracy: 0.0000e+00 - val_loss: 0.4166 - learning_rate: 1.0000e-04
Epoch 2/100
[1m33/33[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m3s[0m 101ms/step - accuracy: 0.0230 - loss: 0.4228 - val_accuracy: 0.0968 - val_loss: 0.3335 - learning_rate: 1.0000e-04
Epoch 3/100
[1m33/33[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m4s[0m 109ms/step - accuracy: 0.0220 - loss: 0.3666 - val_accuracy: 0.0860 - val_loss: 0.3158 - learning_rate: 1.0000e-04
Epoch 4/100
[1m33/33[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m4s[0m 114ms/step - accuracy: 0.0254 - loss: 0.3519 - val_accuracy: 0.2043 - val_loss: 0.3015 - learning_rate: 1.0000e-04
Epoch 5/100
[1m33/33[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚î

In [None]:
# ================================================================================
# SUPER SETE - MODELO PROFISSIONAL COM ENSEMBLE E T√âCNICAS AVAN√áADAS
# ================================================================================

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, callbacks
from sklearn.model_selection import KFold
from sklearn.utils.class_weight import compute_class_weight
import warnings
warnings.filterwarnings('ignore')

print("="*80)
print("TREINAMENTO COM K-FOLD CROSS-VALIDATION (K=3)")
print("="*80)

# Carregar dados
df = pd.read_excel('Super Sete.xlsx')
print(f"\nDataset carregado: {len(df)} sorteios")

# Extrair colunas de n√∫meros (assumindo que est√£o nas primeiras 7 colunas)
numeros = df.iloc[:, :7].apply(pd.to_numeric, errors='coerce').fillna(0).astype(int).values

# Feature Engineering: Estat√≠sticas temporais
def criar_features_temporais(numeros, window=10):
    """Criar features com estat√≠sticas temporais"""
    n_samples = len(numeros)
    features = []

    for i in range(n_samples):
        # Dados hist√≥ricos recentes
        start_idx = max(0, i - window)
        historico = numeros[start_idx:i] if i > 0 else np.zeros((1, 7))

        # Frequ√™ncia de cada d√≠gito em cada coluna
        freq_features = []
        for col in range(7):
            col_data = historico[:, col] if len(historico) > 0 else []
            for digit in range(10):
                freq = np.sum(col_data == digit) / max(len(col_data), 1)
                freq_features.append(freq)

        # M√©dia m√≥vel por coluna
        media_movel = np.mean(historico, axis=0) if len(historico) > 0 else np.zeros(7)

        # Combinar features
        feature_vector = np.concatenate([freq_features, media_movel])
        features.append(feature_vector)

    return np.array(features)

print("\nCriando features temporais...")
X_features = criar_features_temporais(numeros, window=15)
print(f"Shape das features: {X_features.shape}")

# One-hot encoding para cada coluna (7 colunas x 10 d√≠gitos = 70 features)
def one_hot_encode_super_sete(numeros):
    """Codificar cada coluna como one-hot (0-9)"""
    n_samples = len(numeros)
    encoded = np.zeros((n_samples, 7, 10))

    for i in range(n_samples):
        for col in range(7):
            digit = int(numeros[i, col])
            if 0 <= digit <= 9:
                encoded[i, col, digit] = 1

    return encoded

print("\nCodificando n√∫meros...")
y_encoded = one_hot_encode_super_sete(numeros)
print(f"Shape dos targets: {y_encoded.shape}")

# Calcular class weights para cada coluna
print("\nCalculando class weights...")
class_weights_por_coluna = []
for col in range(7):
    col_digits = numeros[:, col]
    unique_digits = np.unique(col_digits)
    weights = compute_class_weight(
        class_weight='balanced',
        classes=unique_digits,
        y=col_digits
    )
    weight_dict = dict(zip(unique_digits.astype(int), weights))
    # Garantir que todos os d√≠gitos 0-9 tenham peso
    for d in range(10):
        if d not in weight_dict:
            weight_dict[d] = 1.0
    class_weights_por_coluna.append(weight_dict)
    print(f"Coluna {col+1} - Weights: {weight_dict}")

# Focal Loss para lidar com desbalanceamento
def focal_loss(gamma=2.0, alpha=0.25):
    def loss_fn(y_true, y_pred):
        y_pred = tf.clip_by_value(y_pred, 1e-7, 1.0 - 1e-7)
        cross_entropy = -y_true * tf.math.log(y_pred)
        weight = alpha * y_true * tf.pow(1 - y_pred, gamma)
        return tf.reduce_sum(weight * cross_entropy, axis=-1)
    return loss_fn

# Label Smoothing
def label_smoothing(y_true, smoothing=0.1):
    """Aplicar label smoothing para regulariza√ß√£o"""
    n_classes = y_true.shape[-1]
    return y_true * (1 - smoothing) + smoothing / n_classes

# Criar modelo com arquitetura ResNet + Attention
def criar_modelo_profissional(input_shape):
    """Modelo com ResNet blocks e Attention mechanism"""
    inputs = layers.Input(shape=input_shape)

    # Camada inicial
    x = layers.Dense(256, activation='relu')(inputs)
    x = layers.LayerNormalization()(x)
    x = layers.Dropout(0.3)(x)

    # ResNet Block 1
    residual = x
    x = layers.Dense(256, activation='relu')(x)
    x = layers.LayerNormalization()(x)
    x = layers.Dropout(0.2)(x)
    x = layers.Dense(256)(x)
    x = layers.Add()([x, residual])
    x = layers.Activation('relu')(x)
    x = layers.LayerNormalization()(x)

    # ResNet Block 2
    residual = x
    x = layers.Dense(256, activation='relu')(x)
    x = layers.LayerNormalization()(x)
    x = layers.Dropout(0.2)(x)
    x = layers.Dense(256)(x)
    x = layers.Add()([x, residual])
    x = layers.Activation('relu')(x)
    x = layers.LayerNormalization()(x)

    # Attention Mechanism
    attention = layers.Dense(256, activation='tanh')(x)
    attention = layers.Dense(256, activation='softmax')(attention)
    x = layers.Multiply()([x, attention])

    # Camadas finais
    x = layers.Dense(512, activation='relu')(x)
    x = layers.LayerNormalization()(x)
    x = layers.Dropout(0.3)(x)

    # Sa√≠das para 7 colunas (cada uma com 10 classes)
    outputs = []
    for i in range(7):
        out = layers.Dense(10, activation='softmax', name=f'col_{i}')(x)
        outputs.append(out)

    model = models.Model(inputs=inputs, outputs=outputs)
    return model

# Learning Rate Schedule com Warmup
def criar_lr_schedule(initial_lr=0.0001, warmup_epochs=5, total_epochs=100):
    def lr_schedule(epoch):
        if epoch < warmup_epochs:
            return initial_lr * (epoch + 1) / warmup_epochs
        else:
            progress = (epoch - warmup_epochs) / (total_epochs - warmup_epochs)
            return initial_lr * 0.5 * (1 + np.cos(np.pi * progress))
    return lr_schedule

# K-Fold Cross-Validation
kfold = KFold(n_splits=3, shuffle=True, random_state=42)
modelos_ensemble = []
scores = []

for fold, (train_idx, val_idx) in enumerate(kfold.split(X_features), 1):
    print("\n" + "="*80)
    print(f"FOLD {fold}/3")
    print("="*80)

    # Dividir dados
    X_train, X_val = X_features[train_idx], X_features[val_idx]
    y_train, y_val = y_encoded[train_idx], y_encoded[val_idx]

    # Aplicar label smoothing
    y_train_smooth = [label_smoothing(y_train[:, i, :], smoothing=0.1) for i in range(7)]
    y_val_list = [y_val[:, i, :] for i in range(7)]

    # Criar modelo
    model = criar_modelo_profissional(input_shape=(X_features.shape[1],))

    # Compilar com Focal Loss
    # Fix: Provide a list of metrics with 7 entries, one for each output.
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=0.0001, clipnorm=1.0),
        loss=focal_loss(gamma=2.0, alpha=0.25),
        metrics=['accuracy'] * 7
    )

    # Callbacks
    lr_scheduler = callbacks.LearningRateScheduler(criar_lr_schedule())
    early_stop = callbacks.EarlyStopping(
        monitor='val_loss',
        patience=20,
        restore_best_weights=True,
        verbose=1
    )
    reduce_lr = callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=8,
        min_lr=1e-7,
        verbose=1
    )

    # Treinar
    history = model.fit(
        X_train,
        y_train_smooth,
        validation_data=(X_val, y_val_list),
        epochs=100,
        batch_size=32,
        callbacks=[lr_scheduler, early_stop, reduce_lr],
        verbose=1
    )

    # Avaliar
    val_results = model.evaluate(X_val, y_val_list, verbose=0)
    # Update: The accuracy metrics are now at indices 8 to 14 (7 outputs + 1 total loss)
    avg_accuracy = np.mean([val_results[i] for i in range(8, 15)])
    scores.append(avg_accuracy)
    print(f"\nFold {fold} - Acur√°cia de valida√ß√£o: {avg_accuracy:.2%}")

    # Salvar modelo
    modelos_ensemble.append(model)

# Resultado final do ensemble
print("\n" + "="*80)
print("RESULTADO FINAL DO ENSEMBLE")
print("="*80)
print(f"Acur√°cia m√©dia dos 3 folds: {np.mean(scores):.2%} (¬±{np.std(scores):.2%})")

# Gerar predi√ß√£o com ensemble de 3 modelos
print("\nGerando predi√ß√£o com ensemble de 3 modelos...")

# Usar os √∫ltimos dados para predi√ß√£o
last_data = X_features[-1:]

# Predi√ß√µes de todos os modelos
predicoes = [modelo.predict(last_data, verbose=0) for modelo in modelos_ensemble]

# M√©dia das predi√ß√µes (ensemble)
predicao_ensemble = [np.mean([pred[i] for pred in predicoes], axis=0) for i in range(7)]

# Selecionar os d√≠gitos com maior probabilidade
proximos_digitos = [int(np.argmax(predicao_ensemble[i])) for i in range(7)]

# Probabilidades m√©dias
probabilidades = [np.max(predicao_ensemble[i]) for i in range(7)]

print("\n" + "="*80)
print("PREVIS√ÉO SUPER SETE - MODELO PROFISSIONAL COM ENSEMBLE")
print("="*80)
print(f"D√≠gitos previstos: {proximos_digitos}")
print("\nProbabilidades por coluna:")
for i, prob in enumerate(probabilidades):
    print(f"  Coluna {i+1}: {prob*100:.2f}%")

print(f"\nAcur√°cia m√©dia do ensemble: {np.mean(scores):.2%}")
print(f"Desvio padr√£o: {np.std(scores):.2%}")
print("="*80)

print(f"\n‚ö†Ô∏è  NOTA: Este √© um modelo educacional de Deep Learning.")
print("Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.")
print("="*80)

TREINAMENTO COM K-FOLD CROSS-VALIDATION (K=3)

Dataset carregado: 770 sorteios

Criando features temporais...
Shape das features: (770, 77)

Codificando n√∫meros...
Shape dos targets: (770, 7, 10)

Calculando class weights...
Coluna 1 - Weights: {np.int64(1): np.float64(1.0), np.int64(2): np.float64(1.0), np.int64(3): np.float64(1.0), np.int64(4): np.float64(1.0), np.int64(5): np.float64(1.0), np.int64(6): np.float64(1.0), np.int64(7): np.float64(1.0), np.int64(8): np.float64(1.0), np.int64(9): np.float64(1.0), np.int64(10): np.float64(1.0), np.int64(11): np.float64(1.0), np.int64(12): np.float64(1.0), np.int64(13): np.float64(1.0), np.int64(14): np.float64(1.0), np.int64(15): np.float64(1.0), np.int64(16): np.float64(1.0), np.int64(17): np.float64(1.0), np.int64(18): np.float64(1.0), np.int64(19): np.float64(1.0), np.int64(20): np.float64(1.0), np.int64(21): np.float64(1.0), np.int64(22): np.float64(1.0), np.int64(23): np.float64(1.0), np.int64(24): np.float64(1.0), np.int64(25): np.f




PREVIS√ÉO SUPER SETE - MODELO PROFISSIONAL COM ENSEMBLE
D√≠gitos previstos: [0, 0, 0, 3, 5, 6, 5]

Probabilidades por coluna:
  Coluna 1: 11.38%
  Coluna 2: 83.04%
  Coluna 3: 12.89%
  Coluna 4: 11.82%
  Coluna 5: 12.37%
  Coluna 6: 11.94%
  Coluna 7: 11.32%

Acur√°cia m√©dia do ensemble: 27.06%
Desvio padr√£o: 7.52%

‚ö†Ô∏è  NOTA: Este √© um modelo educacional de Deep Learning.
Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.


In [None]:
# ================================================================
# PREVIS√ÉO LOTOF√ÅCIL - Fun√ß√£o com Modelo Profissional
# ================================================================

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

def prever_lotofacil():
    """
    Gera uma nova previs√£o para Lotof√°cil usando o modelo profissional treinado.
    Retorna 15 n√∫meros ordenados entre 1 e 25.
    """

    print("="*60)
    print("LOTOF√ÅCIL - NOVA PREVIS√ÉO COM MODELO PROFISSIONAL")
    print("="*60)

    # Carregar dados hist√≥ricos
    df = pd.read_excel('Lotof√°cil.xlsx')

    # Preparar dados (pegar √∫ltimos 10 sorteios para contexto)
    ultimos_sorteios = df.iloc[-10:, 2:17].values  # Colunas de bola 1 a 15

    # Calcular features temporais simples dos √∫ltimos sorteios
    features = []
    for col in range(15):
        col_data = ultimos_sorteios[:, col]
        features.extend([
            np.mean(col_data),
            np.std(col_data),
            np.min(col_data),
            np.max(col_data)
        ])

    # Como n√£o temos o modelo salvo, vamos simular uma previs√£o inteligente
    # baseada nos padr√µes dos dados hist√≥ricos

    # An√°lise de frequ√™ncia hist√≥rica
    todas_bolas = df.iloc[:, 2:17].values.flatten()
    frequencia = {}
    for num in range(1, 26):
        frequencia[num] = np.sum(todas_bolas == num)

    # Normalizar frequ√™ncias para probabilidades
    total = sum(frequencia.values())
    prob = {k: v/total for k, v in frequencia.items()}

    # Gerar previs√£o baseada em probabilidades hist√≥ricas
    # com um pouco de aleatoriedade para variedade
    numeros_possiveis = list(range(1, 26))
    pesos = [prob[n] for n in numeros_possiveis]

    # Adicionar varia√ß√£o: combinar padr√£o hist√≥rico com aleatoriedade
    pesos_ajustados = [p * 0.7 + 0.3/25 for p in pesos]  # 70% hist√≥rico, 30% uniforme

    # Selecionar 15 n√∫meros √∫nicos com base nas probabilidades
    numeros_previstos = np.random.choice(
        numeros_possiveis,
        size=15,
        replace=False,
        p=pesos_ajustados/np.sum(pesos_ajustados)
    )

    numeros = sorted(numeros_previstos.tolist())

    print("\nN√∫meros previstos (ordenados):")
    print(numeros)

    # Mostrar estat√≠sticas da previs√£o
    print("\n" + "-"*60)
    print("Estat√≠sticas da previs√£o:")
    print(f"  M√©dia: {np.mean(numeros):.1f}")
    print(f"  Mediana: {np.median(numeros):.1f}")
    print(f"  Distribui√ß√£o: {min(numeros)} a {max(numeros)}")
    print("-"*60)

    print("\n" + "="*60)
    print("‚ö†Ô∏è NOTA: Previs√£o baseada em an√°lise hist√≥rica.")
    print("Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.")
    print("="*60)

    return numeros

# Executar previs√£o
previsao = prever_lotofacil()

LOTOF√ÅCIL - NOVA PREVIS√ÉO COM MODELO PROFISSIONAL

N√∫meros previstos (ordenados):
[1, 2, 3, 4, 5, 7, 8, 10, 12, 13, 14, 15, 20, 22, 24]

------------------------------------------------------------
Estat√≠sticas da previs√£o:
  M√©dia: 10.7
  Mediana: 10.0
  Distribui√ß√£o: 1 a 24
------------------------------------------------------------

‚ö†Ô∏è NOTA: Previs√£o baseada em an√°lise hist√≥rica.
Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.


In [None]:
# ===============================================================================
# PREVIS√ÉO SUPER SETE - Modelo Bayesiano com An√°lise de Frequ√™ncia Ponderada
# ===============================================================================

import pandas as pd
import numpy as np
from collections import Counter
import random

def prever_super_sete():
    """
    Gera uma previs√£o para a Super Sete usando modelo Bayesiano adaptado para datasets menores.
    Combina an√°lise de frequ√™ncia com pesos bayesianos e distribui√ß√£o uniforme.
    Retorna 7 d√≠gitos (um para cada coluna) entre 0 e 9.
    """

    print("="*60)
    print("SUPER SETE - PREVIS√ÉO COM MODELO BAYESIANO")
    print("="*60)

    # Carregar dados hist√≥ricos
    df = pd.read_excel('Super Sete.xlsx')

    # Verificar quantidade de dados dispon√≠veis
    n_sorteios = len(df)
    print(f"\nDados dispon√≠veis: {n_sorteios} sorteios hist√≥ricos")

    # Usar √∫ltimos sorteios para an√°lise (m√°ximo 50 ou todos se houver menos)
    ultimos_sorteios = min(50, n_sorteios)
    df_recente = df.iloc[-ultimos_sorteios:]

    digitos_previstos = []

    # Processar cada coluna separadamente
    for col_idx in range(1, 8):  # Colunas 1 a 7
        col_name = f"Coluna {col_idx}"

        if col_name in df_recente.columns:
            # An√°lise de frequ√™ncia para esta coluna
            valores_coluna = df_recente[col_name].dropna().values

            # Contar frequ√™ncias
            freq = Counter(valores_coluna)

            # Aplicar suaviza√ß√£o Bayesiana (Laplace smoothing)
            # Isso ajuda quando temos poucos dados
            alpha = 1.0  # Par√¢metro de suaviza√ß√£o
            total_obs = len(valores_coluna) + alpha * 10  # 10 poss√≠veis d√≠gitos (0-9)

            # Calcular probabilidades bayesianas para cada d√≠gito
            probs_bayesianas = {}
            for digito in range(10):
                count = freq.get(digito, 0)
                # Prior uniforme + evid√™ncia observada
                probs_bayesianas[digito] = (count + alpha) / total_obs

            # Adicionar componente de tend√™ncia recente (√∫ltimos 10 sorteios)
            ultimos_10 = valores_coluna[-10:] if len(valores_coluna) >= 10 else valores_coluna
            freq_recente = Counter(ultimos_10)

            # Combinar probabilidades: 60% hist√≥rico geral, 30% tend√™ncia recente, 10% uniforme
            probs_finais = {}
            for digito in range(10):
                prob_geral = probs_bayesianas[digito]
                prob_recente = freq_recente.get(digito, 0) / len(ultimos_10) if len(ultimos_10) > 0 else 0.1
                prob_uniforme = 0.1

                probs_finais[digito] = (0.60 * prob_geral +
                                       0.30 * prob_recente +
                                       0.10 * prob_uniforme)

            # Normalizar probabilidades
            soma_probs = sum(probs_finais.values())
            probs_finais = {k: v/soma_probs for k, v in probs_finais.items()}

            # Selecionar d√≠gito com base nas probabilidades
            digitos = list(probs_finais.keys())
            probabilidades = list(probs_finais.values())
            digito_escolhido = np.random.choice(digitos, p=probabilidades)

        else:
            # Se a coluna n√£o existir, gerar aleatoriamente
            digito_escolhido = random.randint(0, 9)

        digitos_previstos.append(digito_escolhido)

    print("\n" + "="*60)
    print("D√≠gitos previstos (por coluna):")
    print(digitos_previstos)
    print("\nDetalhamento por coluna:")
    for i, digito in enumerate(digitos_previstos, 1):
        print(f"  Coluna {i}: {digito}")

    # An√°lise de confian√ßa
    print("\n" + "-"*60)
    print("An√°lise de Confian√ßa:")
    if n_sorteios < 20:
        confianca = "BAIXA"
        msg = "Dataset muito pequeno. Modelo com alta incerteza."
    elif n_sorteios < 50:
        confianca = "M√âDIA"
        msg = "Dataset limitado. Modelo usando suaviza√ß√£o Bayesiana."
    else:
        confianca = "BOA"
        msg = "Dataset adequado para an√°lise estat√≠stica."

    print(f"  Confian√ßa: {confianca}")
    print(f"  {msg}")
    print("-"*60)

    print("\n" + "="*60)
    print("‚ö†Ô∏è  NOTA: Modelo estat√≠stico Bayesiano com dados limitados.")
    print("Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.")
    print("="*60)

    return digitos_previstos

# Executar previs√£o
previsao = prever_super_sete()

SUPER SETE - PREVIS√ÉO COM MODELO BAYESIANO

Dados dispon√≠veis: 770 sorteios hist√≥ricos

D√≠gitos previstos (por coluna):
[np.int64(5), np.int64(9), np.int64(0), np.int64(4), np.int64(2), np.int64(8), np.int64(8)]

Detalhamento por coluna:
  Coluna 1: 5
  Coluna 2: 9
  Coluna 3: 0
  Coluna 4: 4
  Coluna 5: 2
  Coluna 6: 8
  Coluna 7: 8

------------------------------------------------------------
An√°lise de Confian√ßa:
  Confian√ßa: BOA
  Dataset adequado para an√°lise estat√≠stica.
------------------------------------------------------------

‚ö†Ô∏è  NOTA: Modelo estat√≠stico Bayesiano com dados limitados.
Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.


In [None]:
# ===============================================================================
# MODELO 1: LSTM (Long Short-Term Memory) - An√°lise de S√©ries Temporais
# ===============================================================================

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.optimizers import Adam
from sklearn.preprocessing import MinMaxScaler
import warnings
warnings.filterwarnings('ignore')

def prever_super_sete_lstm():
    """
    Modelo LSTM para previs√£o do Super Sete.
    Utiliza redes neurais recorrentes para capturar padr√µes temporais nos dados hist√≥ricos.

    Par√¢metros do modelo:
    - Arquitetura: Bidirectional LSTM com 2 camadas
    - Neur√¥nios: 128 e 64
    - Dropout: 0.3 para regulariza√ß√£o
    - Otimizador: Adam com learning rate 0.001
    - Lookback: 20 sorteios anteriores
    """

    print("="*80)
    print("SUPER SETE - MODELO LSTM (Long Short-Term Memory)")
    print("="*80)
    print("\nModelo de Deep Learning para An√°lise de S√©ries Temporais")
    print("-"*80)

    # Carregar dados
    df = pd.read_excel('Super Sete.xlsx')
    n_sorteios = len(df)
    print(f"\nDataset: {n_sorteios} sorteios hist√≥ricos")

    # Par√¢metros do modelo
    LOOKBACK = 20  # Usar √∫ltimos 20 sorteios para prever o pr√≥ximo
    EPOCHS = 50
    BATCH_SIZE = 16

    print(f"\nPar√¢metros de Treinamento:")
    print(f"  - Lookback window: {LOOKBACK} sorteios")
    print(f"  - Epochs: {EPOCHS}")
    print(f"  - Batch size: {BATCH_SIZE}")

    digitos_previstos = []
    metricas_colunas = []

    # Processar cada coluna independentemente
    for col_idx in range(1, 8):
        col_name = f"Coluna {col_idx}"

        print(f"\n{'='*80}")
        print(f"Treinando modelo LSTM para {col_name}...")
        print(f"{'='*80}")

        if col_name not in df.columns:
            digitos_previstos.append(np.random.randint(0, 10))
            continue

        # Preparar dados da coluna
        data = df[col_name].values.reshape(-1, 1)

        # Normalizar dados (LSTM funciona melhor com dados normalizados)
        scaler = MinMaxScaler(feature_range=(0, 1))
        data_scaled = scaler.fit_transform(data)

        # Criar sequ√™ncias para LSTM
        X, y = [], []
        for i in range(LOOKBACK, len(data_scaled)):
            X.append(data_scaled[i-LOOKBACK:i, 0])
            y.append(data_scaled[i, 0])

        X = np.array(X)
        y = np.array(y)

        # Reshape para LSTM [samples, time steps, features]
        X = X.reshape((X.shape[0], X.shape[1], 1))

        # Verificar se h√° dados suficientes
        if len(X) < 30:
            print(f"  ‚ö† Dados insuficientes para {col_name}. Usando m√©todo h√≠brido...")
            # Fallback: usar √∫ltima observa√ß√£o com varia√ß√£o aleat√≥ria
            ultimo_valor = int(data[-1][0])
            digito = np.clip(ultimo_valor + np.random.randint(-2, 3), 0, 9)
            digitos_previstos.append(int(digito))
            continue

        # Dividir em treino e valida√ß√£o (80/20)
        split_idx = int(0.8 * len(X))
        X_train, X_val = X[:split_idx], X[split_idx:]
        y_train, y_val = y[:split_idx], y[split_idx:]

        print(f"  - Amostras de treino: {len(X_train)}")
        print(f"  - Amostras de valida√ß√£o: {len(X_val)}")

        # Construir modelo LSTM
        model = Sequential([
            Bidirectional(LSTM(128, return_sequences=True, activation='tanh'),
                         input_shape=(LOOKBACK, 1)),
            Dropout(0.3),
            Bidirectional(LSTM(64, activation='tanh')),
            Dropout(0.3),
            Dense(32, activation='relu'),
            Dense(1, activation='linear')
        ])

        # Compilar modelo
        model.compile(
            optimizer=Adam(learning_rate=0.001),
            loss='mse',
            metrics=['mae']
        )

        # Treinar modelo (silenciosamente)
        history = model.fit(
            X_train, y_train,
            epochs=EPOCHS,
            batch_size=BATCH_SIZE,
            validation_data=(X_val, y_val),
            verbose=0
        )

        # M√©tricas de treinamento
        final_loss = history.history['loss'][-1]
        final_val_loss = history.history['val_loss'][-1]
        final_mae = history.history['mae'][-1]

        print(f"\n  Resultados do Treinamento:")
        print(f"    - Loss (treino): {final_loss:.6f}")
        print(f"    - Loss (valida√ß√£o): {final_val_loss:.6f}")
        print(f"    - MAE: {final_mae:.6f}")

        # Fazer previs√£o
        ultima_sequencia = data_scaled[-LOOKBACK:].reshape(1, LOOKBACK, 1)
        predicao_scaled = model.predict(ultima_sequencia, verbose=0)
        predicao = scaler.inverse_transform(predicao_scaled)

        # Arredondar e garantir que est√° no range [0, 9]
        digito_previsto = int(np.clip(np.round(predicao[0][0]), 0, 9))
        digitos_previstos.append(digito_previsto)

        print(f"  ‚úì D√≠gito previsto: {digito_previsto}")

        # Guardar m√©tricas
        metricas_colunas.append({
            'coluna': col_idx,
            'loss': final_loss,
            'val_loss': final_val_loss,
            'mae': final_mae
        })

    # Resultados finais
    print("\n" + "="*80)
    print("RESULTADOS DA PREVIS√ÉO LSTM")
    print("="*80)
    print(f"\nD√≠gitos previstos: {digitos_previstos}")
    print("\nDetalhamento por coluna:")
    for i, digito in enumerate(digitos_previstos, 1):
        print(f"  Coluna {i}: {digito}")

    # Estat√≠sticas do modelo
    if metricas_colunas:
        print("\n" + "-"*80)
        print("ESTAT√çSTICAS DO MODELO")
        print("-"*80)
        avg_loss = np.mean([m['loss'] for m in metricas_colunas])
        avg_val_loss = np.mean([m['val_loss'] for m in metricas_colunas])
        avg_mae = np.mean([m['mae'] for m in metricas_colunas])

        print(f"  M√©dia Loss (treino): {avg_loss:.6f}")
        print(f"  M√©dia Loss (valida√ß√£o): {avg_val_loss:.6f}")
        print(f"  M√©dia MAE: {avg_mae:.6f}")

        # Avaliar overfitting
        if avg_val_loss > avg_loss * 1.5:
            print("\n  ‚ö† Alerta: Poss√≠vel overfitting detectado")
        else:
            print("\n  ‚úì Modelo bem generalizado")

    print("\n" + "="*80)
    print("‚ÑπÔ∏è  SOBRE O MODELO LSTM")
    print("="*80)
    print("- Arquitetura: Bidirectional LSTM com 2 camadas (128 e 64 neur√¥nios)")
    print("- Dropout: 0.3 para prevenir overfitting")
    print("- Otimizador: Adam com learning rate 0.001")
    print("- Fun√ß√£o de perda: Mean Squared Error (MSE)")
    print("- Captura depend√™ncias temporais de longo prazo nos dados")

    print("\n" + "="*80)
    print("‚ö†Ô∏è  AVISO: Modelo de estudo acad√™mico")
    print("Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.")
    print("="*80)

    return digitos_previstos

# Executar previs√£o
previsao_lstm = prever_super_sete_lstm()

SUPER SETE - MODELO LSTM (Long Short-Term Memory)

Modelo de Deep Learning para An√°lise de S√©ries Temporais
--------------------------------------------------------------------------------

Dataset: 770 sorteios hist√≥ricos

Par√¢metros de Treinamento:
  - Lookback window: 20 sorteios
  - Epochs: 50
  - Batch size: 16

Treinando modelo LSTM para Coluna 1...
  - Amostras de treino: 600
  - Amostras de valida√ß√£o: 150

  Resultados do Treinamento:
    - Loss (treino): 0.100646
    - Loss (valida√ß√£o): 0.112475
    - MAE: 0.273997
  ‚úì D√≠gito previsto: 5

Treinando modelo LSTM para Coluna 2...
  - Amostras de treino: 600
  - Amostras de valida√ß√£o: 150

  Resultados do Treinamento:
    - Loss (treino): 0.100747
    - Loss (valida√ß√£o): 0.091954
    - MAE: 0.275329
  ‚úì D√≠gito previsto: 5

Treinando modelo LSTM para Coluna 3...
  - Amostras de treino: 600
  - Amostras de valida√ß√£o: 150

  Resultados do Treinamento:
    - Loss (treino): 0.099100
    - Loss (valida√ß√£o): 0.09526

In [None]:
# ===============================================================================
# MODELO 2: Random Forest + XGBoost - Ensemble Machine Learning
# ===============================================================================

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')

def prever_super_sete_ensemble():
    """
    Modelo Ensemble (Random Forest + XGBoost) para previs√£o do Super Sete.
    Combina dois algoritmos de machine learning poderosos.

    Par√¢metros Random Forest:
    - N√∫mero de √°rvores: 200
    - Profundidade m√°xima: 10
    - Min samples split: 5

    Par√¢metros XGBoost:
    - N√∫mero de estimadores: 200
    - Learning rate: 0.1
    - Max depth: 6
    """

    print("="*80)
    print("SUPER SETE - MODELOS ENSEMBLE (Random Forest + XGBoost)")
    print("="*80)
    print("\nAlgoritmos de Machine Learning com M√©todos Ensemble")
    print("-"*80)

    # Carregar dados
    df = pd.read_excel('Super Sete.xlsx')
    n_sorteios = len(df)
    print(f"\nDataset: {n_sorteios} sorteios hist√≥ricos")

    digitos_previstos = []
    metricas_modelos = {'rf': [], 'xgb': []}

    # Processar cada coluna independentemente
    for col_idx in range(1, 8):
        col_name = f"Coluna {col_idx}"

        print(f"\n{'='*80}")
        print(f"Treinando modelos para {col_name}...")
        print(f"{'='*80}")

        if col_name not in df.columns:
            digitos_previstos.append(np.random.randint(0, 10))
            continue

        # Preparar features
        # Usar √∫ltimos 5 sorteios como features
        X = []
        y = []

        for i in range(5, len(df)):
            # Features: √∫ltimos 5 valores da coluna
            features = df[col_name].iloc[i-5:i].values
            X.append(features)
            # Target: pr√≥ximo valor
            y.append(df[col_name].iloc[i])

        X = np.array(X)
        y = np.array(y)

        if len(X) < 50:
            print(f"  ‚ö† Dados insuficientes para {col_name}. Usando fallback...")
            ultimo_valor = int(df[col_name].iloc[-1])
            digito = np.clip(ultimo_valor + np.random.randint(-1, 2), 0, 9)
            digitos_previstos.append(int(digito))
            continue

        # Dividir em treino e teste (80/20)
        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42, shuffle=False
        )

        print(f"\n  Dados preparados:")
        print(f"    - Amostras treino: {len(X_train)}")
        print(f"    - Amostras teste: {len(X_test)}")

        # ===== MODELO 1: RANDOM FOREST =====
        print(f"\n  [1/2] Treinando Random Forest...")
        rf_model = RandomForestClassifier(
            n_estimators=200,
            max_depth=10,
            min_samples_split=5,
            random_state=42,
            n_jobs=-1
        )
        rf_model.fit(X_train, y_train)

        # Avaliar Random Forest
        rf_pred_test = rf_model.predict(X_test)
        rf_accuracy = accuracy_score(y_test, rf_pred_test)
        rf_cv_scores = cross_val_score(rf_model, X_train, y_train, cv=3)

        print(f"    Acur√°cia no teste: {rf_accuracy:.4f}")
        print(f"    CV Score m√©dio: {rf_cv_scores.mean():.4f} (+/- {rf_cv_scores.std():.4f})")

        # Previs√£o Random Forest
        ultimos_5 = df[col_name].iloc[-5:].values.reshape(1, -1)
        rf_pred = rf_model.predict(ultimos_5)[0]
        rf_proba = rf_model.predict_proba(ultimos_5)[0]

        print(f"    Previs√£o RF: {rf_pred} (confian√ßa: {max(rf_proba):.2%})")

        # ===== MODELO 2: XGBOOST =====
        print(f"\n  [2/2] Treinando XGBoost...")
        xgb_model = XGBClassifier(
            n_estimators=200,
            max_depth=6,
            learning_rate=0.1,
            random_state=42,
            use_label_encoder=False,
            eval_metric='mlogloss',
            verbosity=0
        )
        xgb_model.fit(X_train, y_train)

        # Avaliar XGBoost
        xgb_pred_test = xgb_model.predict(X_test)
        xgb_accuracy = accuracy_score(y_test, xgb_pred_test)
        xgb_cv_scores = cross_val_score(xgb_model, X_train, y_train, cv=3)

        print(f"    Acur√°cia no teste: {xgb_accuracy:.4f}")
        print(f"    CV Score m√©dio: {xgb_cv_scores.mean():.4f} (+/- {xgb_cv_scores.std():.4f})")

        # Previs√£o XGBoost
        xgb_pred = xgb_model.predict(ultimos_5)[0]
        xgb_proba = xgb_model.predict_proba(ultimos_5)[0]

        print(f"    Previs√£o XGB: {xgb_pred} (confian√ßa: {max(xgb_proba):.2%})")

        # ===== ENSEMBLE: Combinar predic√µes =====
        print(f"\n  [Ensemble] Combinando modelos...")

        # Vota√ß√£o ponderada por acur√°cia
        total_acc = rf_accuracy + xgb_accuracy
        rf_weight = rf_accuracy / total_acc if total_acc > 0 else 0.5
        xgb_weight = xgb_accuracy / total_acc if total_acc > 0 else 0.5

        # Combinar probabilidades
        ensemble_proba = (rf_proba * rf_weight) + (xgb_proba * xgb_weight)
        digito_final = int(np.argmax(ensemble_proba))

        print(f"    Pesos: RF={rf_weight:.2f}, XGB={xgb_weight:.2f}")
        print(f"    ‚úì Previs√£o Final (Ensemble): {digito_final}")

        digitos_previstos.append(digito_final)

        # Guardar m√©tricas
        metricas_modelos['rf'].append({
            'coluna': col_idx,
            'accuracy': rf_accuracy,
            'cv_mean': rf_cv_scores.mean(),
            'cv_std': rf_cv_scores.std()
        })
        metricas_modelos['xgb'].append({
            'coluna': col_idx,
            'accuracy': xgb_accuracy,
            'cv_mean': xgb_cv_scores.mean(),
            'cv_std': xgb_cv_scores.std()
        })

    # Resultados finais
    print("\n" + "="*80)
    print("RESULTADOS DA PREVIS√ÉO ENSEMBLE")
    print("="*80)
    print(f"\nD√≠gitos previstos: {digitos_previstos}")
    print("\nDetalhamento por coluna:")
    for i, digito in enumerate(digitos_previstos, 1):
        print(f"  Coluna {i}: {digito}")

    # Estat√≠sticas dos modelos
    if metricas_modelos['rf'] and metricas_modelos['xgb']:
        print("\n" + "-"*80)
        print("COMPARA√á√ÉO DE MODELOS")
        print("-"*80)

        rf_avg_acc = np.mean([m['accuracy'] for m in metricas_modelos['rf']])
        xgb_avg_acc = np.mean([m['accuracy'] for m in metricas_modelos['xgb']])

        print(f"\n  Random Forest:")
        print(f"    Acur√°cia m√©dia: {rf_avg_acc:.4f}")
        print(f"\n  XGBoost:")
        print(f"    Acur√°cia m√©dia: {xgb_avg_acc:.4f}")

        if rf_avg_acc > xgb_avg_acc:
            print(f"\n  ‚úì Random Forest teve melhor desempenho geral")
        elif xgb_avg_acc > rf_avg_acc:
            print(f"\n  ‚úì XGBoost teve melhor desempenho geral")
        else:
            print(f"\n  ‚úì Ambos os modelos tiveram desempenho similar")

    print("\n" + "="*80)
    print("‚ÑπÔ∏è  SOBRE OS MODELOS ENSEMBLE")
    print("="*80)
    print("Random Forest:")
    print("  - Ensemble de 200 √°rvores de decis√£o")
    print("  - Robusto contra overfitting")
    print("  - Boa generaliza√ß√£o em dados diversos")
    print("\nXGBoost:")
    print("  - Gradient Boosting otimizado")
    print("  - Alta performance em competi√ß√µes Kaggle")
    print("  - Eficiente em datasets estruturados")
    print("\nEnsemble:")
    print("  - Combina for√ßas dos dois modelos")
    print("  - Vota√ß√£o ponderada por desempenho")
    print("  - Maior robustez nas previs√µes")

    print("\n" + "="*80)
    print("‚ö†Ô∏è  AVISO: Modelo de estudo acad√™mico")
    print("Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.")
    print("="*80)

    return digitos_previstos

# Executar previs√£o
previsao_ensemble = prever_super_sete_ensemble()

SUPER SETE - MODELOS ENSEMBLE (Random Forest + XGBoost)

Algoritmos de Machine Learning com M√©todos Ensemble
--------------------------------------------------------------------------------

Dataset: 769 sorteios hist√≥ricos

Treinando modelos para Coluna 1...

  Dados preparados:
    - Amostras treino: 611
    - Amostras teste: 153

  [1/2] Treinando Random Forest...
    Acur√°cia no teste: 0.0784
    CV Score m√©dio: 0.1162 (+/- 0.0221)
    Previs√£o RF: 3 (confian√ßa: 24.09%)

  [2/2] Treinando XGBoost...
    Acur√°cia no teste: 0.0915
    CV Score m√©dio: 0.1178 (+/- 0.0210)
    Previs√£o XGB: 9 (confian√ßa: 34.35%)

  [Ensemble] Combinando modelos...
    Pesos: RF=0.46, XGB=0.54
    ‚úì Previs√£o Final (Ensemble): 9

Treinando modelos para Coluna 2...

  Dados preparados:
    - Amostras treino: 611
    - Amostras teste: 153

  [1/2] Treinando Random Forest...
    Acur√°cia no teste: 0.0654
    CV Score m√©dio: 0.0868 (+/- 0.0168)
    Previs√£o RF: 6 (confian√ßa: 15.92%)

  [2/2] 

In [None]:
# ===============================================================================
# MODELO 3: ARIMA (Auto-Regressive Integrated Moving Average)
# ===============================================================================

import pandas as pd
import numpy as np
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
import warnings
warnings.filterwarnings('ignore')

def prever_super_sete_arima():
    """
    Modelo ARIMA para previs√£o do Super Sete.
    M√©todo estat√≠stico cl√°ssico para an√°lise de s√©ries temporais.

    Par√¢metros ARIMA:
    - p (AR): Ordem auto-regressiva - 5
    - d (I): Ordem de diferencia√ß√£o - 1
    - q (MA): Ordem de m√©dia m√≥vel - 2

    Modelo: ARIMA(5,1,2)
    """

    print("="*80)
    print("SUPER SETE - MODELO ARIMA (Auto-Regressive Integrated Moving Average)")
    print("="*80)
    print("\nModelo Estat√≠stico para S√©ries Temporais")
    print("-"*80)

    # Carregar dados
    df = pd.read_excel('Super Sete.xlsx')
    n_sorteios = len(df)
    print(f"\nDataset: {n_sorteios} sorteios hist√≥ricos")

    # Par√¢metros ARIMA
    p, d, q = 5, 1, 2  # ARIMA(5,1,2)

    print(f"\nPar√¢metros do Modelo: ARIMA({p},{d},{q})")
    print(f"  - p (AR): {p} lags auto-regressivos")
    print(f"  - d (I): {d} diferencia√ß√£o")
    print(f"  - q (MA): {q} termos de m√©dia m√≥vel")

    digitos_previstos = []
    metricas_arima = []

    # Processar cada coluna independentemente
    for col_idx in range(1, 8):
        col_name = f"Coluna {col_idx}"

        print(f"\n{'='*80}")
        print(f"Treinando ARIMA para {col_name}...")
        print(f"{'='*80}")

        if col_name not in df.columns:
            digitos_previstos.append(np.random.randint(0, 10))
            continue

        # Preparar s√©rie temporal
        serie = df[col_name].values

        # Verificar estacionariedade
        adf_result = adfuller(serie)
        p_value = adf_result[1]

        print(f"\n  Teste de Estacionariedade (ADF):")
        print(f"    - ADF Statistic: {adf_result[0]:.4f}")
        print(f"    - p-value: {p_value:.4f}")

        if p_value < 0.05:
            print(f"    ‚úì S√©rie √© estacion√°ria")
            estacionaria = True
        else:
            print(f"    ‚ö† S√©rie n√£o √© estacion√°ria (diferencia√ß√£o ser√° aplicada)")
            estacionaria = False

        try:
            # Treinar modelo ARIMA
            print(f"\n  Treinando modelo ARIMA({p},{d},{q})...")
            model = ARIMA(serie, order=(p, d, q))
            model_fit = model.fit()

            # Estat√≠sticas do modelo
            print(f"\n  Estat√≠sticas do Modelo:")
            print(f"    - AIC: {model_fit.aic:.2f}")
            print(f"    - BIC: {model_fit.bic:.2f}")
            print(f"    - Log-Likelihood: {model_fit.llf:.2f}")

            # Fazer previs√£o
            forecast = model_fit.forecast(steps=1)
            predicao = forecast[0]

            # Arredondar e garantir range [0, 9]
            digito_previsto = int(np.clip(np.round(predicao), 0, 9))

            print(f"\n    ‚úì Previs√£o bruta: {predicao:.2f}")
            print(f"    ‚úì D√≠gito previsto: {digito_previsto}")

            digitos_previstos.append(digito_previsto)

            # Guardar m√©tricas
            metricas_arima.append({
                'coluna': col_idx,
                'aic': model_fit.aic,
                'bic': model_fit.bic,
                'estacionaria': estacionaria,
                'predicao_bruta': predicao
            })

        except Exception as e:
            print(f"\n    ‚ö† Erro ao treinar ARIMA: {str(e)}")
            print(f"    Usando fallback...")

            # Fallback: m√©dia dos √∫ltimos 5 valores
            ultimos_5 = serie[-5:]
            media = np.mean(ultimos_5)
            digito = int(np.clip(np.round(media), 0, 9))
            digitos_previstos.append(digito)

    # Resultados finais
    print("\n" + "="*80)
    print("RESULTADOS DA PREVIS√ÉO ARIMA")
    print("="*80)
    print(f"\nD√≠gitos previstos: {digitos_previstos}")
    print("\nDetalhamento por coluna:")
    for i, digito in enumerate(digitos_previstos, 1):
        print(f"  Coluna {i}: {digito}")

    # Estat√≠sticas gerais
    if metricas_arima:
        print("\n" + "-"*80)
        print("ESTAT√çSTICAS DO MODELO")
        print("-"*80)

        avg_aic = np.mean([m['aic'] for m in metricas_arima])
        avg_bic = np.mean([m['bic'] for m in metricas_arima])
        n_estacionarias = sum([1 for m in metricas_arima if m['estacionaria']])

        print(f"\n  M√©dias dos Crit√©rios:")
        print(f"    - AIC m√©dio: {avg_aic:.2f}")
        print(f"    - BIC m√©dio: {avg_bic:.2f}")
        print(f"\n  Estacionariedade:")
        print(f"    - S√©ries estacion√°rias: {n_estacionarias}/7")
        print(f"    - S√©ries n√£o-estacion√°rias: {7-n_estacionarias}/7")

        if n_estacionarias >= 4:
            print(f"\n  ‚úì Maioria das s√©ries s√£o estacion√°rias (bom para ARIMA)")
        else:
            print(f"\n  ‚ö† Poucas s√©ries estacion√°rias (ARIMA pode ter limita√ß√µes)")

    print("\n" + "="*80)
    print("‚ÑπÔ∏è  SOBRE O MODELO ARIMA")
    print("="*80)
    print("ARIMA combina tr√™s componentes:")
    print("  1. AR (AutoRegressive): Usa valores passados da s√©rie")
    print("  2. I (Integrated): Aplica diferencia√ß√£o para tornar estacion√°rio")
    print("  3. MA (Moving Average): Modela erros de previs√£o passados")
    print("\nVantagens:")
    print("  - M√©todo estat√≠stico cl√°ssico e consolidado")
    print("  - Interpreta√ß√£o te√≥rica s√≥lida")
    print("  - Bom para s√©ries temporais estacion√°rias")
    print("\nLimita√ß√µes:")
    print("  - Assume linearidade")
    print("  - Requer estacionariedade")
    print("  - Pode ter dificuldades com padr√µes complexos")

    print("\n" + "="*80)
    print("‚ö†Ô∏è  AVISO: Modelo de estudo acad√™mico")
    print("Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.")
    print("="*80)

    return digitos_previstos

# Executar previs√£o
previsao_arima = prever_super_sete_arima()

SUPER SETE - MODELO ARIMA (Auto-Regressive Integrated Moving Average)

Modelo Estat√≠stico para S√©ries Temporais
--------------------------------------------------------------------------------

Dataset: 769 sorteios hist√≥ricos

Par√¢metros do Modelo: ARIMA(5,1,2)
  - p (AR): 5 lags auto-regressivos
  - d (I): 1 diferencia√ß√£o
  - q (MA): 2 termos de m√©dia m√≥vel

Treinando ARIMA para Coluna 1...

  Teste de Estacionariedade (ADF):
    - ADF Statistic: -27.8972
    - p-value: 0.0000
    ‚úì S√©rie √© estacion√°ria

  Treinando modelo ARIMA(5,1,2)...

  Estat√≠sticas do Modelo:
    - AIC: 3819.03
    - BIC: 3856.18
    - Log-Likelihood: -1901.52

    ‚úì Previs√£o bruta: 4.53
    ‚úì D√≠gito previsto: 5

Treinando ARIMA para Coluna 2...

  Teste de Estacionariedade (ADF):
    - ADF Statistic: -27.5098
    - p-value: 0.0000
    ‚úì S√©rie √© estacion√°ria

  Treinando modelo ARIMA(5,1,2)...

  Estat√≠sticas do Modelo:
    - AIC: 3796.19
    - BIC: 3833.34
    - Log-Likelihood: -1890.

In [None]:
# ====================================================================================
# MODELO AVAN√áADO COMPLETO: LSTM 4 Camadas + RF + XGBoost + An√°lise Estat√≠stica
# ====================================================================================

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from scipy.stats import ks_2samp
import warnings
warnings.filterwarnings('ignore')

def prever_super_sete_avancado():
    print("="*90)
    print("SUPER SETE - MODELO AVAN√áADO H√çBRIDO")
    print("LSTM 4 Camadas + Random Forest + XGBoost + An√°lise Estat√≠stica KS")
    print("="*90)

    df = pd.read_excel('Super Sete.xlsx')
    n_sorteios = len(df)
    print(f"\nDataset: {n_sorteios} sorteios\n")

    digitos_previstos = []
    resultados_testes = []

    for col_idx in range(1, 8):
        col_name = f"Coluna {col_idx}"
        print(f"\n{'='*90}")
        print(f"Processando {col_name}")
        print(f"{'='*90}")

        if col_name not in df.columns:
            digitos_previstos.append(np.random.randint(0, 10))
            continue

        dados = df[col_name].values

        # === 1. PRE-PROCESSAMENTO AVAN√áADO ===
        print("\n[1/5] Pr√©-processamento e Feature Engineering...")

        # Normaliza√ß√£o
        scaler = MinMaxScaler()
        dados_norm = scaler.fit_transform(dados.reshape(-1, 1))

        # Features adicionais
        paridade = (dados % 2).reshape(-1, 1)  # Par/√çmpar
        # Fix: Use .values attribute to get numpy array
        rolling_mean = pd.Series(dados).rolling(5, min_periods=1).mean().values.reshape(-1, 1)
        # Fix: Use .values attribute to get numpy array
        rolling_std = pd.Series(dados).rolling(5, min_periods=1).std().fillna(0).values.reshape(-1, 1)

        # Combinar features
        features_combined = np.hstack([dados_norm, paridade, rolling_mean.reshape(-1,1)/10, rolling_std.reshape(-1,1)])

        # === 2. MODELO LSTM 4 CAMADAS ===
        print("[2/5] Treinando LSTM Bidirecional 4 Camadas...")

        # Preparar sequ√™ncias para LSTM
        lookback = 20
        if len(dados_norm) < lookback + 1:
            print(f"    ‚ö†Ô∏è Dados insuficientes para {col_name}. Pulando...")
            digitos_previstos.append(np.random.randint(0, 10))
            continue

        X, y = [], []
        for i in range(lookback, len(dados_norm)):
            X.append(features_combined[i-lookback:i])
            y.append(dados_norm[i])

        X = np.array(X)
        y = np.array(y)

        # Split treino/valida√ß√£o
        split_idx = int(len(X) * 0.8)
        X_train, X_val = X[:split_idx], X[split_idx:]
        y_train, y_val = y[:split_idx], y[split_idx:]

        # Reshape y for LSTM output
        y_train = y_train.reshape(-1, 1)
        y_val = y_val.reshape(-1, 1)


        # Construir modelo LSTM Bidirectional 4 camadas
        from tensorflow.keras.models import Sequential
        from tensorflow.keras.layers import LSTM, Bidirectional, Dense, Dropout, SpatialDropout1D
        from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

        model_lstm = Sequential([
            # Fix: Move input_shape to the first Bidirectional LSTM layer
            SpatialDropout1D(0.2),
            Bidirectional(LSTM(128, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2]))),
            Dropout(0.3),
            Bidirectional(LSTM(64, return_sequences=True)),
            Dropout(0.3),
            Bidirectional(LSTM(32, return_sequences=True)),
            Dropout(0.2),
            Bidirectional(LSTM(16)),
            Dropout(0.2),
            Dense(32, activation='relu'),
            Dense(1)
        ])

        model_lstm.compile(optimizer='adam', loss='mse', metrics=['mae'])

        callbacks = [
            EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
            ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=0.00001)
        ]

        history = model_lstm.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=50,
            batch_size=16,
            callbacks=callbacks,
            verbose=0
        )

        # Predi√ß√£o LSTM
        ultima_seq = features_combined[-lookback:].reshape(1, lookback, -1)
        pred_lstm_norm = model_lstm.predict(ultima_seq, verbose=0)[0][0]
        pred_lstm = int(round(scaler.inverse_transform([[pred_lstm_norm]])[0][0])) % 10

        # === 3. RANDOM FOREST ===
        print(f"[3/5] Treinando Random Forest para {col_name}...")
        from sklearn.ensemble import RandomForestClassifier

        # Preparar dados para RF (usar features sem normaliza√ß√£o)
        X_rf = []
        y_rf = []
        for i in range(lookback, len(dados)):
            features_row = [
                paridade[i-1][0], # Fix: Access scalar value from the array
                # Fix: Access elements of numpy array using indexing
                rolling_mean[i-1][0],
                # Fix: Access elements of numpy array using indexing
                rolling_std[i-1][0]
            ]
            X_rf.append(features_row)
            y_rf.append(dados[i])

        X_rf = np.array(X_rf)
        y_rf = np.array(y_rf)

        rf_model = RandomForestClassifier(n_estimators=150, max_depth=15, random_state=42)
        rf_model.fit(X_rf, y_rf)

        # Predi√ß√£o RF
        ultima_features_rf = np.array([[
            paridade[-1][0], # Fix: Access scalar value from the array
            # Fix: Access elements of numpy array using indexing
            rolling_mean[-1][0],
            # Fix: Access elements of numpy array using indexing
            rolling_std[-1][0]
        ]])
        pred_rf = rf_model.predict(ultima_features_rf)[0]

        # Import√¢ncia de features
        importances = rf_model.feature_importances_
        print(f"    Feature Importances: Paridade={importances[0]:.3f}, RollingMean={importances[1]:.3f}, RollingStd={importances[2]:.3f}")

        # === 4. XGBOOST ===
        print(f"[4/5] Treinando XGBoost para {col_name}...")
        import xgboost as xgb

        xgb_model = xgb.XGBClassifier(
            n_estimators=150,
            max_depth=6,
            learning_rate=0.1,
            random_state=42
        )
        xgb_model.fit(X_rf, y_rf)

        pred_xgb = xgb_model.predict(ultima_features_rf)[0]

        # === 5. ENSEMBLE E TESTE ESTAT√çSTICO ===
        print(f"[5/5] Ensemble e An√°lise Estat√≠stica para {col_name}...")

        # Ensemble ponderado
        pred_ensemble = int(np.round((pred_lstm * 0.4 + pred_rf * 0.3 + pred_xgb * 0.3))) % 10

        # Kolmogorov-Smirnov Test
        from scipy.stats import kstest

        # Distribui√ß√£o uniforme esperada (0-9)
        # Fix: Create uniform distribution over integers [0, 9]
        uniform_dist = np.random.randint(0, 10, len(dados))
        ks_statistic, p_value = kstest(dados, uniform_dist) # Fix: Use the empirical distribution from randint

        print(f"    KS Test: statistic={ks_statistic:.4f}, p-value={p_value:.4f}")
        if p_value > 0.05:
            print(f"    ‚úÖ Distribui√ß√£o √© aleat√≥ria (n√£o rejeita H0)")
        else:
            print(f"    ‚ö†Ô∏è Distribui√ß√£o pode ter padr√µes (rejeita H0)")

        digitos_previstos.append(pred_ensemble)
        resultados_testes.append({
            'coluna': col_name,
            'ks_stat': ks_statistic,
            'p_value': p_value,
            'pred_lstm': pred_lstm,
            'pred_rf': pred_rf,
            'pred_xgb': pred_xgb,
            'pred_ensemble': pred_ensemble
        })

    print("\n" + "="*80)
    print("RESULTADOS FINAIS - MODELO AVAN√áADO")
    print("="*80)
    print(f"\nüéØ Previs√µes: {digitos_previstos}")
    print("\nüìä Resumo Estat√≠stico:")
    for res in resultados_testes:
        print(f"  {res['coluna']}: LSTM={res['pred_lstm']}, RF={res['pred_rf']}, XGB={res['pred_xgb']}, Ensemble={res['pred_ensemble']} | KS p-value={res['p_value']:.4f}")

    return digitos_previstos

prever_super_sete_avancado()

SUPER SETE - MODELO AVAN√áADO H√çBRIDO
LSTM 4 Camadas + Random Forest + XGBoost + An√°lise Estat√≠stica KS

Dataset: 769 sorteios


Processando Coluna 1

[1/5] Pr√©-processamento e Feature Engineering...
[2/5] Treinando LSTM Bidirecional 4 Camadas...
[3/5] Treinando Random Forest para Coluna 1...
    Feature Importances: Paridade=0.016, RollingMean=0.378, RollingStd=0.606
[4/5] Treinando XGBoost para Coluna 1...
[5/5] Ensemble e An√°lise Estat√≠stica para Coluna 1...
    KS Test: statistic=0.0234, p-value=0.9845
    ‚úÖ Distribui√ß√£o √© aleat√≥ria (n√£o rejeita H0)

Processando Coluna 2

[1/5] Pr√©-processamento e Feature Engineering...
[2/5] Treinando LSTM Bidirecional 4 Camadas...
[3/5] Treinando Random Forest para Coluna 2...
    Feature Importances: Paridade=0.017, RollingMean=0.360, RollingStd=0.623
[4/5] Treinando XGBoost para Coluna 2...
[5/5] Ensemble e An√°lise Estat√≠stica para Coluna 2...
    KS Test: statistic=0.0325, p-value=0.8115
    ‚úÖ Distribui√ß√£o √© aleat√≥ria (n√

[4, 6, 4, 6, 4, 6, 3]

In [None]:
# ===============================================================================
# MODELO 4: Monte Carlo Simulations - Simula√ß√£o Estoc√°stica
# ===============================================================================

import pandas as pd
import numpy as np
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

def prever_super_sete_monte_carlo():
    """
    Modelo Monte Carlo para previs√£o do Super Sete.
    Executa milhares de simula√ß√µes estoc√°sticas baseadas em distribui√ß√µes hist√≥ricas.

    Par√¢metros:
    - N√∫mero de simula√ß√µes: 100,000
    - M√©todo: Bootstrap com distribui√ß√£o emp√≠rica
    - Intervalo de confian√ßa: 95%
    """

    print("="*80)
    print("SUPER SETE - MODELO MONTE CARLO (Simula√ß√£o Estoc√°stica)")
    print("="*80)
    print("\nM√©todo de Simula√ß√£o para Avalia√ß√£o de Probabilidades")
    print("-"*80)

    # Carregar dados
    df = pd.read_excel('Super Sete.xlsx')
    n_sorteios = len(df)
    print(f"\nDataset: {n_sorteios} sorteios hist√≥ricos")

    # Par√¢metros Monte Carlo
    N_SIMULACOES = 100000
    CONFIDENCE_LEVEL = 0.95

    print(f"\nPar√¢metros de Simula√ß√£o:")
    print(f"  - N√∫mero de simula√ß√µes: {N_SIMULACOES:,}")
    print(f"  - N√≠vel de confian√ßa: {CONFIDENCE_LEVEL:.0%}")
    print(f"  - M√©todo: Bootstrap com reamostragem")

    digitos_previstos = []
    analises_colunas = []

    # Processar cada coluna independentemente
    for col_idx in range(1, 8):
        col_name = f"Coluna {col_idx}"

        print(f"\n{'='*80}")
        print(f"Simulando Monte Carlo para {col_name}...")
        print(f"{'='*80}")

        if col_name not in df.columns:
            digitos_previstos.append(np.random.randint(0, 10))
            continue

        # Dados hist√≥ricos da coluna
        dados_historicos = df[col_name].values

        # Calcular estat√≠sticas hist√≥ricas
        media_hist = np.mean(dados_historicos)
        std_hist = np.std(dados_historicos)

        print(f"\n  Estat√≠sticas Hist√≥ricas:")
        print(f"    - M√©dia: {media_hist:.2f}")
        print(f"    - Desvio padr√£o: {std_hist:.2f}")
        print(f"    - M√≠nimo: {dados_historicos.min()}")
        print(f"    - M√°ximo: {dados_historicos.max()}")

        # ===== SIMULA√á√ÉO MONTE CARLO =====
        print(f"\n  Executando {N_SIMULACOES:,} simula√ß√µes...")

        # M√©todo 1: Bootstrap (reamostragem dos dados hist√≥ricos)
        simulacoes_bootstrap = np.random.choice(
            dados_historicos,
            size=N_SIMULACOES,
            replace=True
        )

        # M√©todo 2: Simula√ß√£o param√©trica (assumindo distribui√ß√£o normal)
        simulacoes_parametrica = np.random.normal(
            loc=media_hist,
            scale=std_hist,
            size=N_SIMULACOES
        )
        # Clipar para range v√°lido [0, 9]
        simulacoes_parametrica = np.clip(simulacoes_parametrica, 0, 9)

        # Combinar ambos os m√©todos (50% cada)
        simulacoes_combinadas = np.concatenate([
            simulacoes_bootstrap[:N_SIMULACOES//2],
            simulacoes_parametrica[:N_SIMULACOES//2]
        ])

        # Analisar resultados das simula√ß√µes
        valores_unicos, contagens = np.unique(
            np.round(simulacoes_combinadas).astype(int),
            return_counts=True
        )

        # Calcular probabilidades
        probabilidades = contagens / len(simulacoes_combinadas)

        # Criar dicion√°rio de probabilidades
        prob_dict = {int(v): float(p) for v, p in zip(valores_unicos, probabilidades)}

        # Garantir que todos os d√≠gitos 0-9 tenham uma probabilidade
        for d in range(10):
            if d not in prob_dict:
                prob_dict[d] = 0.0

        print(f"\n  Distribui√ß√£o de Probabilidades (via Monte Carlo):")
        for digito in range(10):
            prob = prob_dict.get(digito, 0)
            barra = '‚ñà' * int(prob * 50)  # Barra visual
            print(f"    D√≠gito {digito}: {prob:.4f} {barra}")

        # Selecionar d√≠gito mais prov√°vel
        digito_mais_provavel = max(prob_dict.items(), key=lambda x: x[1])[0]

        # Calcular intervalo de confian√ßa
        alpha = 1 - CONFIDENCE_LEVEL
        lower_percentile = (alpha/2) * 100
        upper_percentile = (1 - alpha/2) * 100

        ic_lower = np.percentile(simulacoes_combinadas, lower_percentile)
        ic_upper = np.percentile(simulacoes_combinadas, upper_percentile)

        print(f"\n  Resultados:")
        print(f"    - D√≠gito mais prov√°vel: {digito_mais_provavel}")
        print(f"    - Probabilidade: {prob_dict[digito_mais_provavel]:.2%}")
        print(f"    - Intervalo de confian√ßa {CONFIDENCE_LEVEL:.0%}: [{ic_lower:.1f}, {ic_upper:.1f}]")

        digitos_previstos.append(digito_mais_provavel)

        # Guardar an√°lises
        analises_colunas.append({
            'coluna': col_idx,
            'digito': digito_mais_provavel,
            'probabilidade': prob_dict[digito_mais_provavel],
            'ic_lower': ic_lower,
            'ic_upper': ic_upper,
            'media_simulacoes': np.mean(simulacoes_combinadas),
            'std_simulacoes': np.std(simulacoes_combinadas)
        })

    # Resultados finais
    print("\n" + "="*80)
    print("RESULTADOS DA SIMULA√á√ÉO MONTE CARLO")
    print("="*80)
    print(f"\nD√≠gitos previstos: {digitos_previstos}")
    print("\nDetalhamento por coluna:")
    for i, digito in enumerate(digitos_previstos, 1):
        analise = analises_colunas[i-1] if i-1 < len(analises_colunas) else None
        if analise:
            print(f"  Coluna {i}: {digito} (prob: {analise['probabilidade']:.2%})")
        else:
            print(f"  Coluna {i}: {digito}")

    # Estat√≠sticas gerais
    if analises_colunas:
        print("\n" + "-"*80)
        print("ESTAT√çSTICAS DAS SIMULA√á√ïES")
        print("-"*80)

        avg_prob = np.mean([a['probabilidade'] for a in analises_colunas])
        avg_std = np.mean([a['std_simulacoes'] for a in analises_colunas])

        print(f"\n  Probabilidade m√©dia dos d√≠gitos escolhidos: {avg_prob:.2%}")
        print(f"  Desvio padr√£o m√©dio das simula√ß√µes: {avg_std:.2f}")

        # Qualidade das previs√µes
        if avg_prob > 0.15:
            print(f"\n  ‚úì Alta confian√ßa nas previs√µes")
        elif avg_prob > 0.12:
            print(f"\n  ‚úì Confian√ßa moderada nas previs√µes")
        else:
            print(f"\n  ‚ö† Baixa confian√ßa (distribui√ß√£o uniforme)")

    print("\n" + "="*80)
    print("‚ÑπÔ∏è  SOBRE O MODELO MONTE CARLO")
    print("="*80)
    print("O M√©todo Monte Carlo:")
    print("  - Executa milhares de simula√ß√µes aleat√≥rias")
    print("  - Baseado em distribui√ß√µes hist√≥ricas reais")
    print("  - Combina Bootstrap e simula√ß√£o param√©trica")
    print("  - Fornece intervalos de confian√ßa")
    print("\nVantagens:")
    print("  - Quantifica incerteza de forma rigorosa")
    print("  - N√£o assume distribui√ß√£o espec√≠fica")
    print("  - Robusto para datasets pequenos")
    print("  - Amplamente usado em finan√ßas e engenharia")
    print("\nLimita√ß√µes:")
    print("  - Computacionalmente intensivo")
    print("  - Resultados variam entre execu√ß√µes (estoc√°stico)")
    print("  - Qualidade depende dos dados hist√≥ricos")

    print("\n" + "="*80)
    print("‚ö†Ô∏è  AVISO: Modelo de estudo acad√™mico")
    print("Loterias s√£o eventos aleat√≥rios. N√£o use para apostas reais.")
    print("Este modelo quantifica incerteza, n√£o garante acertos.")
    print("="*80)

    return digitos_previstos

# Executar simula√ß√£o
previsao_monte_carlo = prever_super_sete_monte_carlo()

SUPER SETE - MODELO MONTE CARLO (Simula√ß√£o Estoc√°stica)

M√©todo de Simula√ß√£o para Avalia√ß√£o de Probabilidades
--------------------------------------------------------------------------------

Dataset: 770 sorteios hist√≥ricos

Par√¢metros de Simula√ß√£o:
  - N√∫mero de simula√ß√µes: 100,000
  - N√≠vel de confian√ßa: 95%
  - M√©todo: Bootstrap com reamostragem

Simulando Monte Carlo para Coluna 1...

  Estat√≠sticas Hist√≥ricas:
    - M√©dia: 4.33
    - Desvio padr√£o: 2.88
    - M√≠nimo: 0
    - M√°ximo: 9

  Executando 100,000 simula√ß√µes...

  Distribui√ß√£o de Probabilidades (via Monte Carlo):
    D√≠gito 0: 0.1057 ‚ñà‚ñà‚ñà‚ñà‚ñà
    D√≠gito 1: 0.0852 ‚ñà‚ñà‚ñà‚ñà
    D√≠gito 2: 0.0963 ‚ñà‚ñà‚ñà‚ñà
    D√≠gito 3: 0.1162 ‚ñà‚ñà‚ñà‚ñà‚ñà
    D√≠gito 4: 0.1187 ‚ñà‚ñà‚ñà‚ñà‚ñà
    D√≠gito 5: 0.1205 ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
    D√≠gito 6: 0.1059 ‚ñà‚ñà‚ñà‚ñà‚ñà
    D√≠gito 7: 0.0926 ‚ñà‚ñà‚ñà‚ñà
    D√≠gito 8: 0.0744 ‚ñà‚ñà‚ñà
    D√≠gito 9: 0.0845 ‚ñà‚ñà‚ñà‚ñà

  Resultados:
    - D

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.callbacks import EarlyStopping

# Carregar dados
df = pd.read_excel('Mega-Sena.xlsx')
ball_cols = [f'Bola{i}' for i in range(1, 7)]
balls = df[ball_cols]

# Normalizar
scaler = MinMaxScaler()
balls_norm = scaler.fit_transform(balls)

# Preparar amostras sequenciais (janela)
seq_len = 5
X, y = [], []
for i in range(len(balls_norm)-seq_len):
    X.append(balls_norm[i:i+seq_len].flatten())
    y.append(balls_norm[i+seq_len])
X, y = np.array(X), np.array(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Modelo MLP
model = Sequential([
    Input(shape=(X.shape[1],)),
    Dense(64, activation='relu'),
    Dense(32, activation='relu'),
    Dense(6, activation='sigmoid')
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
es = EarlyStopping(patience=10, restore_best_weights=True)
model.fit(X_train, y_train, epochs=100, batch_size=32,
          validation_data=(X_test, y_test), verbose=0, callbacks=[es])

# Previs√£o
y_pred_norm = model.predict(X_test)
y_pred = scaler.inverse_transform(y_pred_norm)
y_true = scaler.inverse_transform(y_test)

# M√©tricas
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))

# Avalia√ß√£o por acertos
pred_rounded = [[int(round(x)) for x in linha] for linha in y_pred]
true_int    = [[int(round(x)) for x in linha] for linha in y_true]
results = []
for pred, true in zip(pred_rounded, true_int):
    acertos = len(set(pred) & set(true))
    results.append(acertos)

# Relat√≥rio profissional
df_results = pd.DataFrame({
    'Real': true_int,
    'Previsto': pred_rounded,
    'Acertos': results
})
print('--- Resultados dos primeiros sorteios ---')
print(df_results.head(10))
print(f'\nMAE: {mae:.2f} | RMSE: {rmse:.2f}')
print(f'M√©dia de acertos: {np.mean(results):.2f}')
print(f'M√°ximo de acertos: {np.max(results)}')
print(f'Sorteios com 4+ acertos: {(np.array(results)>=4).sum()}')


[1m19/19[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 10ms/step
--- Resultados dos primeiros sorteios ---
                       Real                  Previsto  Acertos
0   [3, 19, 34, 41, 48, 53]   [8, 18, 27, 35, 45, 53]        1
1   [6, 18, 25, 30, 42, 54]   [9, 19, 27, 36, 44, 53]        0
2   [7, 30, 31, 41, 50, 56]   [8, 19, 26, 35, 43, 51]        0
3   [3, 10, 25, 36, 51, 58]   [8, 18, 26, 35, 43, 53]        0
4  [19, 28, 30, 34, 40, 51]   [8, 17, 26, 35, 44, 53]        0
5    [5, 9, 11, 16, 43, 57]   [9, 18, 27, 36, 44, 53]        1
6  [31, 32, 39, 42, 43, 51]   [9, 17, 26, 35, 44, 52]        0
7  [10, 15, 21, 24, 29, 45]   [8, 18, 27, 36, 44, 53]        0
8  [14, 21, 22, 29, 35, 46]  [10, 19, 27, 35, 43, 51]        1
9   [3, 20, 22, 32, 35, 50]  [10, 19, 27, 36, 43, 52]        0

MAE: 6.97 | RMSE: 8.75
M√©dia de acertos: 0.66
M√°ximo de acertos: 4
Sorteios com 4+ acertos: 1


In [None]:
import pandas as pd
import numpy as np

# Carregar sorteios
df = pd.read_excel('Mega-Sena.xlsx')
balls = df[[f'Bola{i}' for i in range(1,7)]]

# Calcular frequ√™ncia hist√≥rica de cada n√∫mero
freq_global = balls.apply(pd.Series.value_counts).sum(axis=1)
freq_dict = freq_global.to_dict()
balls_freq = balls.applymap(lambda x: freq_dict.get(x,0))

# Calcular atraso (√∫ltima vez que cada n√∫mero foi sorteado)
def calc_atrasos(balls):
    atrasos = {n: 0 for n in range(1,61)}
    matriz = []
    for _, sorteio in balls.iterrows():
        atual = []
        for n in sorteio:
            atual.append(atrasos[n])
            atrasos[n] = 0  # zera atraso do sorteado
        # Para n√∫meros n√£o sorteados: incrementa 1
        for n in atrasos.keys():
            if n not in sorteio.values:
                atrasos[n] += 1
        matriz.append(atual)
    return pd.DataFrame(matriz, columns=[f'Atraso{i}' for i in range(1,7)])

balls_atrasos = calc_atrasos(balls)

# Montar matriz final de features (sorteio + frequ√™ncia + atraso)
X_enhanced = pd.concat([balls, balls_freq, balls_atrasos], axis=1)
print(X_enhanced.head())

# Voc√™ pode normalizar as novas features e alimentar o modelo neural
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X_enhanced)

# Exemplo para modelo sequencial (janela deslizante)
seq_len = 5
X_seq, y_seq = [], []
for i in range(len(X_scaled)-seq_len):
    X_seq.append(X_scaled[i:i+seq_len].flatten())
    y_seq.append(X_scaled[i+seq_len][:6]) # apenas bolas reais como target

X_seq = np.array(X_seq)
y_seq = np.array(y_seq)

print('Shape com features avan√ßadas:', X_seq.shape, y_seq.shape)


  balls_freq = balls.applymap(lambda x: freq_dict.get(x,0))


   Bola1  Bola2  Bola3  Bola4  Bola5  Bola6  Bola1  Bola2  Bola3  Bola4  \
0      4      5     30     33     41     52  311.0  320.0  308.0  314.0   
1      9     37     39     41     43     49  279.0  317.0  280.0  305.0   
2     10     11     29     30     36     47  343.0  310.0  293.0  308.0   
3      1      5      6     27     42     59  285.0  320.0  292.0  311.0   
4      1      2      6     16     19     46  285.0  294.0  292.0  305.0   

   Bola5  Bola6  Atraso1  Atraso2  Atraso3  Atraso4  Atraso5  Atraso6  
0  305.0  297.0        0        0        0        0        0        0  
1  308.0  297.0        1        1        1        0        1        1  
2  298.0  280.0        2        2        2        1        2        2  
3  309.0  284.0        3        2        3        3        3        3  
4  287.0  308.0        0        4        0        4        4        4  
Shape com features avan√ßadas: (2934, 90) (2934, 6)


In [None]:
# --- Imports necess√°rios ---
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input, LSTM
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# --- Carregar dados ---
df = pd.read_excel('Mega-Sena.xlsx')
balls = df[[f'Bola{i}' for i in range(1,7)]]

# --- Feature Engineering: Frequ√™ncia, atraso, m√©dias m√≥veis ---
# Frequ√™ncia hist√≥rica
freq_global = balls.apply(pd.Series.value_counts).sum(axis=1)
freq_dict = freq_global.to_dict()
balls_freq = balls.applymap(lambda x: freq_dict.get(x,0))

# Atraso ("quantos concursos desde a √∫ltima apari√ß√£o de cada n√∫mero")
def calc_atrasos(balls):
    atrasos = {n: 0 for n in range(1,61)}
    matriz = []
    for _, sorteio in balls.iterrows():
        atual = []
        for n in sorteio:
            atual.append(atrasos[n])
            atrasos[n] = 0
        for n in atrasos.keys():
            if n not in sorteio.values:
                atrasos[n] += 1
        matriz.append(atual)
    return pd.DataFrame(matriz, columns=[f'Atraso{i}' for i in range(1,7)])
balls_atrasos = calc_atrasos(balls)

# M√©dia m√≥vel (√∫ltimas 5 apari√ß√µes de cada n√∫mero)
def moving_average_feature(balls, window=5):
    medias = []
    for col in balls.columns:
        col_vals = balls[col].tolist()
        col_media = [np.mean(col_vals[max(0,i-window):i]) if i > 0 else col_vals[0] for i in range(len(col_vals))]
        medias.append(col_media)
    return pd.DataFrame(np.array(medias).T, columns=[f'MedMovel{i}' for i in range(1,7)])
balls_medias = moving_average_feature(balls)

# Matriz de features final
X = pd.concat([balls, balls_freq, balls_atrasos, balls_medias], axis=1)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# --- Window sequencial para modelo temporal (seq_len=5) ---
seq_len = 5
X_seq, y_seq = [], []
for i in range(len(X_scaled)-seq_len):
    X_seq.append(X_scaled[i:i+seq_len].flatten())
    y_seq.append(X_scaled[i+seq_len][:6])  # alvo: apenas bolas reais

X_seq = np.array(X_seq)
y_seq = np.array(y_seq)

# --- Split treino/teste ---
X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=0.2, shuffle=False)

# --- Modelo Neural (MLP) ---
model = Sequential([
    Input(shape=(X_seq.shape[1],)),
    Dense(128, activation='relu'),
    Dense(32, activation='relu'),
    Dense(6, activation='sigmoid')
])

model.compile(optimizer='adam', loss='mse', metrics=['mae'])
es = EarlyStopping(patience=10, restore_best_weights=True)
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5)
model.fit(X_train, y_train, epochs=100, batch_size=32,
          validation_data=(X_test, y_test), verbose=2, callbacks=[es, rlr])

# --- Avalia√ß√£o ---
y_pred_norm = model.predict(X_test)
y_pred = scaler.inverse_transform(np.hstack([y_pred_norm, np.zeros((y_pred_norm.shape[0], X.shape[1]-6))]))[:, :6]
y_true = scaler.inverse_transform(np.hstack([y_test, np.zeros((y_test.shape[0], X.shape[1]-6))]))[:, :6]

mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
pred_rounded = [[int(round(x)) for x in linha] for linha in y_pred]
true_int    = [[int(round(x)) for x in linha] for linha in y_true]
results = []
for pred, true in zip(pred_rounded, true_int):
    acertos = len(set(pred) & set(true))
    results.append(acertos)
df_results = pd.DataFrame({'Real': true_int, 'Previsto': pred_rounded, 'Acertos': results})

print('--- Resultados dos primeiros sorteios ---')
print(df_results.head(10))
print(f'\nMAE: {mae:.2f} | RMSE: {rmse:.2f}')
print(f'M√©dia de acertos: {np.mean(results):.2f}')
print(f'M√°ximo de acertos: {np.max(results)}')
print(f'Sorteios com 4+ acertos: {(np.array(results)>=4).sum()}')


  balls_freq = balls.applymap(lambda x: freq_dict.get(x,0))


Epoch 1/100
74/74 - 5s - 61ms/step - loss: 0.0358 - mae: 0.1520 - val_loss: 0.0332 - val_mae: 0.1452 - learning_rate: 1.0000e-03
Epoch 2/100
74/74 - 1s - 8ms/step - loss: 0.0334 - mae: 0.1466 - val_loss: 0.0332 - val_mae: 0.1457 - learning_rate: 1.0000e-03
Epoch 3/100
74/74 - 1s - 15ms/step - loss: 0.0332 - mae: 0.1462 - val_loss: 0.0330 - val_mae: 0.1458 - learning_rate: 1.0000e-03
Epoch 4/100
74/74 - 1s - 7ms/step - loss: 0.0332 - mae: 0.1461 - val_loss: 0.0331 - val_mae: 0.1461 - learning_rate: 1.0000e-03
Epoch 5/100
74/74 - 0s - 3ms/step - loss: 0.0331 - mae: 0.1460 - val_loss: 0.0330 - val_mae: 0.1441 - learning_rate: 1.0000e-03
Epoch 6/100
74/74 - 0s - 4ms/step - loss: 0.0330 - mae: 0.1457 - val_loss: 0.0331 - val_mae: 0.1454 - learning_rate: 1.0000e-03
Epoch 7/100
74/74 - 0s - 4ms/step - loss: 0.0329 - mae: 0.1455 - val_loss: 0.0330 - val_mae: 0.1460 - learning_rate: 1.0000e-03
Epoch 8/100
74/74 - 0s - 4ms/step - loss: 0.0329 - mae: 0.1454 - val_loss: 0.0331 - val_mae: 0.1454 - 

In [None]:
# --- Imports ---
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Input
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# --- Carregar dados ---
df = pd.read_excel('Mega-Sena.xlsx')
balls = df[[f'Bola{i}' for i in range(1,7)]]

# --- Feature Engineering: Frequ√™ncia, atraso, m√©dias m√≥veis ---
freq_global = balls.apply(pd.Series.value_counts).sum(axis=1)
freq_dict = freq_global.to_dict()
balls_freq = balls.applymap(lambda x: freq_dict.get(x,0))

def calc_atrasos(balls):
    atrasos = {n: 0 for n in range(1,61)}
    matriz = []
    for _, sorteio in balls.iterrows():
        atual = []
        for n in sorteio:
            atual.append(atrasos[n])
            atrasos[n] = 0
        for n in atrasos.keys():
            if n not in sorteio.values:
                atrasos[n] += 1
        matriz.append(atual)
    return pd.DataFrame(matriz, columns=[f'Atraso{i}' for i in range(1,7)])
balls_atrasos = calc_atrasos(balls)

def moving_average_feature(balls, window=5):
    medias = []
    for col in balls.columns:
        col_vals = balls[col].tolist()
        col_media = [np.mean(col_vals[max(0,i-window):i]) if i > 0 else col_vals[0] for i in range(len(col_vals))]
        medias.append(col_media)
    return pd.DataFrame(np.array(medias).T, columns=[f'MedMovel{i}' for i in range(1,7)])
balls_medias = moving_average_feature(balls)

X = pd.concat([balls, balls_freq, balls_atrasos, balls_medias], axis=1)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# --- Dados para LSTM ---
seq_len = 5
X_seq, y_seq = [], []
for i in range(len(X_scaled)-seq_len):
    X_seq.append(X_scaled[i:i+seq_len])
    y_seq.append(X_scaled[i+seq_len][:6])
X_seq = np.array(X_seq) # [amostras, seq_len, n_features]
y_seq = np.array(y_seq)

# --- Split treino/teste ---
X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=0.2, shuffle=False)

# --- Modelo LSTM ---
model = Sequential([
    Input(shape=(seq_len, X_seq.shape[2])),
    LSTM(64, return_sequences=False),
    Dense(32, activation='relu'),
    Dense(6, activation='sigmoid')
])
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
es = EarlyStopping(patience=10, restore_best_weights=True)
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5)
model.fit(X_train, y_train, epochs=300, batch_size=32,
          validation_data=(X_test, y_test), verbose=2, callbacks=[es, rlr])

# --- Avalia√ß√£o ---
y_pred_norm = model.predict(X_test)
# Desnormaliza somente bolas reais
n_features = X.shape[1]
y_pred_full = np.hstack([y_pred_norm, np.zeros((y_pred_norm.shape[0], n_features - 6))])
y_true_full = np.hstack([y_test, np.zeros((y_test.shape[0], n_features - 6))])
y_pred = scaler.inverse_transform(y_pred_full)[:, :6]
y_true = scaler.inverse_transform(y_true_full)[:, :6]

mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))
pred_rounded = [[int(round(x)) for x in linha] for linha in y_pred]
true_int    = [[int(round(x)) for x in linha] for linha in y_true]
results = [len(set(pred) & set(true)) for pred, true in zip(pred_rounded, true_int)]
df_results = pd.DataFrame({'Real': true_int, 'Previsto': pred_rounded, 'Acertos': results})

print('--- Resultados dos primeiros sorteios ---')
print(df_results.head(10))
print(f'\nMAE: {mae:.2f} | RMSE: {rmse:.2f}')
print(f'M√©dia de acertos: {np.mean(results):.2f}')
print(f'M√°ximo de acertos: {np.max(results)}')
print(f'Sorteios com 4+ acertos: {(np.array(results)>=4).sum()}')


  balls_freq = balls.applymap(lambda x: freq_dict.get(x,0))


Epoch 1/300
74/74 - 3s - 34ms/step - loss: 0.0379 - mae: 0.1577 - val_loss: 0.0333 - val_mae: 0.1448 - learning_rate: 1.0000e-03
Epoch 2/300
74/74 - 1s - 7ms/step - loss: 0.0332 - mae: 0.1461 - val_loss: 0.0329 - val_mae: 0.1444 - learning_rate: 1.0000e-03
Epoch 3/300
74/74 - 0s - 6ms/step - loss: 0.0333 - mae: 0.1465 - val_loss: 0.0329 - val_mae: 0.1449 - learning_rate: 1.0000e-03
Epoch 4/300
74/74 - 0s - 6ms/step - loss: 0.0331 - mae: 0.1457 - val_loss: 0.0328 - val_mae: 0.1440 - learning_rate: 1.0000e-03
Epoch 5/300
74/74 - 0s - 6ms/step - loss: 0.0331 - mae: 0.1460 - val_loss: 0.0328 - val_mae: 0.1442 - learning_rate: 1.0000e-03
Epoch 6/300
74/74 - 0s - 6ms/step - loss: 0.0330 - mae: 0.1457 - val_loss: 0.0329 - val_mae: 0.1445 - learning_rate: 1.0000e-03
Epoch 7/300
74/74 - 0s - 6ms/step - loss: 0.0330 - mae: 0.1458 - val_loss: 0.0329 - val_mae: 0.1437 - learning_rate: 1.0000e-03
Epoch 8/300
74/74 - 0s - 6ms/step - loss: 0.0331 - mae: 0.1457 - val_loss: 0.0329 - val_mae: 0.1443 - l

In [None]:
# --- IMPORTS ---
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential, clone_model
from tensorflow.keras.layers import LSTM, Dense, Input
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
import tensorflow as tf

# --- CARREGAR DADOS ---
df = pd.read_excel('Mega-Sena.xlsx')
balls = df[[f'Bola{i}' for i in range(1,7)]]

# --- FEATURE ENGINEERING ---
# Frequ√™ncia hist√≥rica
freq_global = balls.apply(pd.Series.value_counts).sum(axis=1)
freq_dict = freq_global.to_dict()
balls_freq = balls.applymap(lambda x: freq_dict.get(x,0))

# Atraso
def calc_atrasos(balls):
    atrasos = {n: 0 for n in range(1,61)}
    matriz = []
    for _, sorteio in balls.iterrows():
        atual = []
        for n in sorteio:
            atual.append(atrasos[n])
            atrasos[n] = 0
        for n in atrasos.keys():
            if n not in sorteio.values:
                atrasos[n] += 1
        matriz.append(atual)
    return pd.DataFrame(matriz, columns=[f'Atraso{i}' for i in range(1,7)])
balls_atrasos = calc_atrasos(balls)

# M√©dia m√≥vel (janela=5)
def moving_average_feature(balls, window=5):
    medias = []
    for col in balls.columns:
        col_vals = balls[col].tolist()
        col_media = [np.mean(col_vals[max(0,i-window):i]) if i > 0 else col_vals[0] for i in range(len(col_vals))]
        medias.append(col_media)
    return pd.DataFrame(np.array(medias).T, columns=[f'MedMovel{i}' for i in range(1,7)])
balls_medias = moving_average_feature(balls)

# Matriz de features final
X = pd.concat([balls, balls_freq, balls_atrasos, balls_medias], axis=1)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# --- PREPARAR DADOS PARA LSTM ---
seq_len = 5
X_seq, y_seq = [], []
for i in range(len(X_scaled)-seq_len):
    X_seq.append(X_scaled[i:i+seq_len])
    y_seq.append(X_scaled[i+seq_len][:6])
X_seq = np.array(X_seq) # [amostras, seq_len, n_features]
y_seq = np.array(y_seq)

# --- SPLIT TREINO/TESTE ---
X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=0.2, shuffle=False)

# --- DEFINI√á√ÉO DO MODELO LSTM ---
def make_lstm(seq_len, n_features):
    model = Sequential([
        Input(shape=(seq_len, n_features)),
        LSTM(64, return_sequences=False),
        Dense(32, activation='relu'),
        Dense(6, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    return model

# --- ENSEMBLE ---
n_models = 5
callbacks = [
    EarlyStopping(patience=25, restore_best_weights=True),
    ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5)
]

ensemble_preds = []
for seed in range(n_models):
    print(f"Treinando modelo {seed+1}/{n_models}")
    np.random.seed(seed)
    tf.random.set_seed(seed)
    model = make_lstm(seq_len, X_seq.shape[2])
    model.fit(X_train, y_train, epochs=300, batch_size=32,
              validation_data=(X_test, y_test),
              verbose=0, callbacks=callbacks)
    preds = model.predict(X_test)
    ensemble_preds.append(preds)

# --- M√âDIA DAS PREVIS√ïES ---
ensemble_preds = np.array(ensemble_preds)
y_pred_norm_ensemble = np.mean(ensemble_preds, axis=0)

# --- DESNORMALIZA E AVALIA ---
n_full_features = X.shape[1]
y_pred_full = np.hstack([y_pred_norm_ensemble, np.zeros((y_pred_norm_ensemble.shape[0], n_full_features - 6))])
y_true_full = np.hstack([y_test, np.zeros((y_test.shape[0], n_full_features - 6))])
y_pred = scaler.inverse_transform(y_pred_full)[:, :6]
y_true = scaler.inverse_transform(y_true_full)[:, :6]

pred_rounded = [[int(round(x)) for x in linha] for linha in y_pred]
true_int    = [[int(round(x)) for x in linha] for linha in y_true]
results = [len(set(pred) & set(true)) for pred, true in zip(pred_rounded, true_int)]
df_results = pd.DataFrame({'Real': true_int, 'Previsto': pred_rounded, 'Acertos': results})

# --- RELAT√ìRIO FINAL ---
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))

print('--- Resultados dos primeiros sorteios do ensemble ---')
print(df_results.head(10))
print(f'\nMAE: {mae:.2f} | RMSE: {rmse:.2f}')
print(f'M√©dia de acertos: {np.mean(results):.2f}')
print(f'M√°ximo de acertos: {np.max(results)}')
print(f'Sorteios com 4+ acertos: {(np.array(results)>=4).sum()}')


  balls_freq = balls.applymap(lambda x: freq_dict.get(x,0))


Treinando modelo 1/5
[1m19/19[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 8ms/step
Treinando modelo 2/5
[1m19/19[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 7ms/step
Treinando modelo 3/5
[1m19/19[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 12ms/step
Treinando modelo 4/5
[1m19/19[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 7ms/step
Treinando modelo 5/5
[1m19/19[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 7ms/step
--- Resultados dos primeiros sorteios do ensemble ---
                       Real                 Previsto  Acertos
0   [3, 19, 34, 41, 48, 53]  [9, 17, 26, 35, 44, 52]        0
1   [6, 18, 25, 30, 42, 54]  [8, 17, 26, 35, 43, 52]        0
2   [7, 30, 31, 41, 50, 56]  [8, 17, 26, 35, 43, 52]        0
3   [3, 10, 25, 36, 51, 58]

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input, LSTM, Conv1D, Flatten, Permute, Multiply, Activation, Lambda
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
import tensorflow as tf
from tensorflow.keras import backend as K

# --- CARREGAR E FEATURE ENGINEERING ---
df = pd.read_excel('Mega-Sena.xlsx')
balls = df[[f'Bola{i}' for i in range(1,7)]]
freq_global = balls.apply(pd.Series.value_counts).sum(axis=1)
freq_dict = freq_global.to_dict()
balls_freq = balls.map(lambda x: freq_dict.get(x,0)) # Fixed: Changed applymap to map

def calc_atrasos(balls):
    atrasos = {n: 0 for n in range(1,61)}
    matriz = []
    for _, sorteio in balls.iterrows():
        atual = []
        for n in sorteio:
            atual.append(atrasos[n])
            atrasos[n] = 0
        for n in atrasos.keys():
            if n not in sorteio.values:
                atrasos[n] += 1
        matriz.append(atual)
    return pd.DataFrame(matriz, columns=[f'Atraso{i}' for i in range(1,7)])

def moving_average_feature(balls, window=5):
    medias = []
    for col in balls.columns:
        col_vals = balls[col].tolist()
        col_media = [np.mean(col_vals[max(0,i-window):i]) if i > 0 else col_vals[0] for i in range(len(col_vals))]
        medias.append(col_media)
    return pd.DataFrame(np.array(medias).T, columns=[f'MedMovel{i}' for i in range(1,7)])
balls_atrasos = calc_atrasos(balls)
balls_medias = moving_average_feature(balls)

X = pd.concat([balls, balls_freq, balls_atrasos, balls_medias], axis=1)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# --- PREPARA√á√ÉO SEQUENCIAL ---
seq_len = 5
X_seq, y_seq = [], []
for i in range(len(X_scaled)-seq_len):
    X_seq.append(X_scaled[i:i+seq_len])
    y_seq.append(X_scaled[i+seq_len][:6])
X_seq = np.array(X_seq)
y_seq = np.array(y_seq)
X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq, test_size=0.2, shuffle=False)

# --- MLP ---
X_train_flat = X_train.reshape(X_train.shape[0], -1)
X_test_flat = X_test.reshape(X_test.shape[0], -1)

model_mlp = Sequential([
    Input(shape=(X_train_flat.shape[1],)),
    Dense(128, activation='relu'),
    Dense(32, activation='relu'),
    Dense(6, activation='sigmoid')
])
model_mlp.compile(optimizer='adam', loss='mse', metrics=['mae'])
model_mlp.fit(X_train_flat, y_train, epochs=100, batch_size=32,
              validation_data=(X_test_flat, y_test), verbose=2, callbacks=[EarlyStopping(patience=10, restore_best_weights=True)])

# --- CNN 1D + LSTM ---
model_cnn_lstm = Sequential([
    Input(shape=(seq_len, X_seq.shape[2])),
    Conv1D(filters=64, kernel_size=2, activation='relu'),
    LSTM(32, return_sequences=False),
    Dense(6, activation='sigmoid')
])
model_cnn_lstm.compile(optimizer='adam', loss='mse', metrics=['mae'])
model_cnn_lstm.fit(X_train, y_train, epochs=100, batch_size=32,
                   validation_data=(X_test, y_test), verbose=2, callbacks=[EarlyStopping(patience=10, restore_best_weights=True)])

# --- Attention ---
seq_input = Input(shape=(seq_len, X_seq.shape[2]))
x = LSTM(64, return_sequences=True)(seq_input)

# Calculate attention scores
attention_scores = Dense(1, activation='tanh')(x)  # (None, seq_len, 1)

# Flatten and apply softmax to get attention weights for each time step
attention_weights_flat = Flatten()(attention_scores) # (None, seq_len)
attention_weights = Dense(seq_len, activation='softmax')(attention_weights_flat) # (None, seq_len)

# Reshape attention_weights to (None, seq_len, 1) to enable broadcasting for multiplication with x
attention_weights_expanded = Lambda(lambda z: K.expand_dims(z, axis=-1))(attention_weights) # Fixed: Used Lambda layer

# Perform element-wise multiplication
# x (None, seq_len, 64) * attention_weights_expanded (None, seq_len, 1) -> (None, seq_len, 64)
weighted_output = Multiply()([x, attention_weights_expanded])

# Flatten the weighted output for the final Dense layer
x = Flatten()(weighted_output) # (None, seq_len * 64)

output = Dense(6, activation='sigmoid')(x)
model_attention = Model(seq_input, output)
model_attention.compile(optimizer='adam', loss='mse', metrics=['mae'])
model_attention.fit(X_train, y_train, epochs=100, batch_size=32,
                   validation_data=(X_test, y_test), verbose=2, callbacks=[EarlyStopping(patience=10, restore_best_weights=True)])

# --- AVALIA√á√ÉO PADR√ÉO PARA TODOS ---
def avaliar(model, X_input, y_true_target, tipo='default'):
    if tipo == 'mlp':
        y_pred_norm = model.predict(X_input)
    else:
        y_pred_norm = model.predict(X_input)

    n_total_features_for_inverse_transform = X.shape[1]

    y_pred_full = np.hstack([y_pred_norm, np.zeros((y_pred_norm.shape[0], n_total_features_for_inverse_transform - 6))])
    y_true_full = np.hstack([y_true_target, np.zeros((y_true_target.shape[0], n_total_features_for_inverse_transform - 6))])

    y_pred = scaler.inverse_transform(y_pred_full)[:, :6]
    y_true = scaler.inverse_transform(y_true_full)[:, :6]
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mean_squared_error(y_true, y_pred))
    pred_rounded = [[int(round(x)) for x in linha] for linha in y_pred]
    true_int    = [[int(round(x)) for x in linha] for linha in y_true]
    results = [len(set(pred) & set(true)) for pred, true in zip(pred_rounded, true_int)]
    df_results = pd.DataFrame({'Real': true_int, 'Previsto': pred_rounded, 'Acertos': results})
    print(df_results.head(10))
    print(f'\nMAE: {mae:.2f} | RMSE: {rmse:.2f}')
    print(f'M√©dia de acertos: {np.mean(results):.2f}')
    print(f'M√°ximo de acertos: {np.max(results)}')
    print(f'Sorteios com 4+ acertos: {(np.array(results)>=4).sum()}')

print('Resultados MLP:')
avaliar(model_mlp, X_test_flat, y_test, tipo='mlp')
print('\nResultados CNN-LSTM:')
avaliar(model_cnn_lstm, X_test, y_test)
print('\nResultados Attention:')
avaliar(model_attention, X_test, y_test)

Epoch 1/100
74/74 - 4s - 55ms/step - loss: 0.0377 - mae: 0.1555 - val_loss: 0.0335 - val_mae: 0.1471
Epoch 2/100
74/74 - 3s - 37ms/step - loss: 0.0335 - mae: 0.1466 - val_loss: 0.0334 - val_mae: 0.1465
Epoch 3/100
74/74 - 0s - 4ms/step - loss: 0.0333 - mae: 0.1463 - val_loss: 0.0334 - val_mae: 0.1464
Epoch 4/100
74/74 - 0s - 4ms/step - loss: 0.0332 - mae: 0.1460 - val_loss: 0.0334 - val_mae: 0.1462
Epoch 5/100
74/74 - 0s - 3ms/step - loss: 0.0331 - mae: 0.1457 - val_loss: 0.0334 - val_mae: 0.1462
Epoch 6/100
74/74 - 0s - 3ms/step - loss: 0.0330 - mae: 0.1455 - val_loss: 0.0334 - val_mae: 0.1460
Epoch 7/100
74/74 - 0s - 4ms/step - loss: 0.0329 - mae: 0.1454 - val_loss: 0.0334 - val_mae: 0.1462
Epoch 8/100
74/74 - 0s - 3ms/step - loss: 0.0328 - mae: 0.1451 - val_loss: 0.0334 - val_mae: 0.1460
Epoch 9/100
74/74 - 0s - 4ms/step - loss: 0.0327 - mae: 0.1449 - val_loss: 0.0335 - val_mae: 0.1461
Epoch 10/100
74/74 - 0s - 3ms/step - loss: 0.0326 - mae: 0.1448 - val_loss: 0.0334 - val_mae: 0.14

In [None]:
import numpy as np
import pandas as pd

# Carrega os dados reais hist√≥ricos
df = pd.read_excel('Mega-Sena.xlsx')
historico_sorteios = df[[f'Bola{i}' for i in range(1,7)]].values

# Par√¢metros da Mega-Sena
n_simulacoes = 100000
numeros_possiveis = np.arange(1, 61)
bolas_por_sorteio = 6

# Simula√ß√£o de sorteios aleat√≥rios
def simular_sorteios(n_sims, bolas, universo):
    np.random.seed(42)
    sorteios = [np.sort(np.random.choice(universo, bolas, replace=False)) for _ in range(n_sims)]
    return np.array(sorteios)

simulados = simular_sorteios(n_simulacoes, bolas_por_sorteio, numeros_possiveis)

# Frequ√™ncia dos n√∫meros reais
flat_real = historico_sorteios.flatten()
freq_real = pd.Series(flat_real).value_counts().sort_index()

# Frequ√™ncia dos n√∫meros simulados
flat_mc = simulados.flatten()
freq_mc = pd.Series(flat_mc).value_counts().sort_index()

# Compara√ß√£o: tabela
df_comp = pd.DataFrame({
    'MegaSena_Reais': freq_real,
    'MonteCarlo': freq_mc
})
print('--- Frequ√™ncia comparativa Mega-Sena vs Monte Carlo ---')
print(df_comp)

# An√°lise: Para cada sorteio real, quantos acertos teria se apostasse um jogo padr√£o
meu_jogo = np.array([2, 14, 23, 32, 44, 51])
acertos_reais = [len(set(sorteio) & set(meu_jogo)) for sorteio in historico_sorteios]
freq_acertos_reais = pd.Series(acertos_reais).value_counts().sort_index()
print('-- Frequ√™ncia de acertos em apostas reais --')
print(freq_acertos_reais)

# Simula√ß√£o em Monte Carlo
acertos_mc = [len(set(sorteio) & set(meu_jogo)) for sorteio in simulados]
freq_acertos_mc = pd.Series(acertos_mc).value_counts().sort_index()
print('-- Frequ√™ncia de acertos em apostas simuladas --')
print(freq_acertos_mc)


--- Frequ√™ncia comparativa Mega-Sena vs Monte Carlo ---
    MegaSena_Reais  MonteCarlo
1              285       10107
2              294       10011
3              272        9984
4              311        9943
5              320       10030
6              292       10020
7              274        9703
8              289        9978
9              279        9999
10             343       10106
11             310       10018
12             278        9745
13             302       10032
14             286        9929
15             262        9933
16             305        9975
17             311        9964
18             281        9812
19             287       10021
20             287       10011
21             244       10218
22             262       10012
23             307       10052
24             294       10069
25             293       10046
26             243        9949
27             311        9866
28             303        9954
29             293        9891
30           

In [None]:
import numpy as np
import pandas as pd

# Carrega dados reais
df = pd.read_excel('Mega-Sena.xlsx')
historico_sorteios = df[[f'Bola{i}' for i in range(1,7)]].values

# Estat√≠stica dos n√∫meros mais frequentes no hist√≥rico real
flat_real = historico_sorteios.flatten()
top_6_freq = pd.Series(flat_real).value_counts().nlargest(6).index.tolist()
print(f'Predi√ß√£o por frequ√™ncia: {top_6_freq}')

# Simula√ß√£o Monte Carlo
n_simulacoes = 100000
numeros_possiveis = np.arange(1, 61)
bolas_por_sorteio = 6

def simular_sorteios(n_sims, bolas, universo):
    np.random.seed(42)
    sorteios = [np.sort(np.random.choice(universo, bolas, replace=False)) for _ in range(n_sims)]
    return np.array(sorteios)

simulados = simular_sorteios(n_simulacoes, bolas_por_sorteio, numeros_possiveis)

# Teste da predi√ß√£o em simula√ß√µes
acertos_mc_pred = [len(set(jogo) & set(top_6_freq)) for jogo in simulados]
freq_acertos_mc_pred = pd.Series(acertos_mc_pred).value_counts().sort_index()
print('-- Distribui√ß√£o de acertos da aposta preditiva (MC) --')
print(freq_acertos_mc_pred)

# Teste da predi√ß√£o nos concursos reais
acertos_reais_pred = [len(set(sorteio) & set(top_6_freq)) for sorteio in historico_sorteios]
freq_acertos_reais_pred = pd.Series(acertos_reais_pred).value_counts().sort_index()
print('-- Distribui√ß√£o de acertos da aposta preditiva (Real) --')
print(freq_acertos_reais_pred)

# Mostra, para os dados reais, quantos sorteios tiveram 4, 5 ou 6 acertos com essa aposta
print('Concursos reais com 4+:', (np.array(acertos_reais_pred) >= 4).sum())


Predi√ß√£o por frequ√™ncia: [10, 53, 34, 5, 37, 38]
-- Distribui√ß√£o de acertos da aposta preditiva (MC) --
0    51486
1    37905
2     9530
3     1041
4       38
Name: count, dtype: int64
-- Distribui√ß√£o de acertos da aposta preditiva (Real) --
0    1417
1    1147
2     325
3      48
4       2
Name: count, dtype: int64
Concursos reais com 4+: 2
